Apa, 10 


of. Applied Psychology 


bea Donald G. Paterson, University of Minnesota 


r 


‘ 





P Table of Contents 


|| The Quanifetion of om Indutrit Emplavee Swroty.; Meth: F, J. Harets .... 103 
: The Quantification of on Industrial Employee Survey. 11. Application: F. J. Haxats 112 
/ “item Aualyste? Versus “Scale Analysit” : : P. H, Kearor aww K. E. Cuaee | 
The Aicline Pilot's Job: T. Gonnon 

_ Factors Related to Life Insurance Selling : D. F, Kauw awo J, M, Hapiey 


A Window-Stencit Method for Scoring the Strong Vocational Interest Biank 


(Ben): J. E, Gaxenr, R. T. Osponwx ann W. B. Sanpers 
A Short Test of Mental Ability: J. L. Or1s axp D. J. Cursuze 
rere rere nee, roneeree, o8 i, Pasty et lalelent on 


tution ia reb taeluadion: 8 The Reliability of an Abbreviated Job Evaluation 
System: C. H. Lawsne anp P. C. Farsao 


Odor Selection, Preferences and Identification: B. Locxe anv C. H. Grimm 
Prediction of Female Readership of Magasine Articles: E. Panuorr 
Special Review: The Third Mental Measurements Yearbook: E. D. Sissow 





Published Bi-monthly by The American Psychological Association, Inc. 
Prince and Lemon Sts.,; Lancaster, Pa., and 
1515 Massachusetts Ave., NW, Washington 5, D.C. 
Eaters’ as second-class matter, August 19, 1943, at the past office at Latcaster, Pa., under the act of March 3, 1879 


Acceptance lo sali wt the provided for in the Act of February. 28, 1925, 
‘bodied lat purasraghe a: tection S30 PL. ed Re, nathorised: October 10, . 
Copyright, 1949, by The American Psychological Association, Inc. 





Vici cnein sdbadslshe os wahuire: 
" ehetagical Association, Tac. Annual subscription, $6.00, single copies, $125. Subscrip- 
tions, orders, and other business communications should be addressed to the American 
Paychological Association, Inc. 1515 Massachusetts Aveoue, N. W., Washington 5, D. C. 
__Asticles for: publication ‘should tbe sent to the Editor, Donald G. Paterson, Department 
of Paychology, University of Minnesota, Minneapolis 14, Minn, 


Tide raceal. gives promet: cobniderition Wo mepeacriite repiating: erighial doves 
- potions ia say field of moped paychology except clincal and consulting: paychology. A 
descriptive or thtoretical article is occasionally accepted if it deals in a distinctive manner 
with a problem of applied psychology. The policy is, however, to favor papers dealing 
ee ee ere eee ee 
fields: Vocational diagnosis and occupational guidance; educational diagnosis, prediction 
and guidance st the secondary school level and higher; personnel selection, training, place- 
ment, transfer and promotion in business, industry and government service including the 
| armed: forces; supervisory training in business, industry and government; illumination, 
venitilation and fatigue in industry; job analysis, description, classification and evaluation; 
. measurement of morale of executives, supervisors, or employees; surveys of opinion on 
‘ social or political issues, such as those conducted by The Psychological Corporation; 
psychological problems in market research and in advertising. 


Articles. may be under 500 words. The maximum is 16,000 words, the average in 
the neighborhood of 4,000 words. To reduce lag of publication, adherence to the rule of 
“brevity consistent with clarity” is encouraged. 


A lapse of six to twelve months occurs between acceptance of an article and its 
publication, the lag varying with the rate at which manuscripts are submitted. If, how- 
ever, an author is prepared to defray the costs of printing the necessary extra pages, he 
may arrange for earlier publication without thereby postponing the appearance of manu- 
scripts by other contributors. This enables the management to provide space in addition 
to the scheduled 80 pages per issue. “Early publication” is thus a direct contribution to 
the subscribers. By cutting down lag in publication, it also benefits those authors whose 
afticles are published in regular turn. 


Tables, footnotes and references as well as text of. manuscripts should be typed 
double-spaced throughout. Authors should adhere to the conventions described by J. E. 
Anderson and W. L. Valentine in “The preparation of articles for publication in the 
journals of the American Psychologica! Association,” Psychol. Bull., 1944, 41, 345-376. 
A reprint of this article will be loaned to any prospective contributor who does not find 
it in his library. 





Journal of Applied Psychology 








Vol. 33, No. 2 April, 1949 








The Quantification of an Industrial Employee Survey. 
I. Method * 


Frank J. Harris { 
Division of Education and Applied Psychology, Purdue University 


The research project to be described is an attempt to develop a new 
technique to measure quantitatively the morale of industrial employees. 
In the past, two general approaches have been made to this problem. 
The first is an adaptation of the attitude scaling technique first described 
by Thurstone and Chave (3). By means of this technique a score is 
obtained which indicates the general attitude of employees toward the 
company for which they work. However, this type of scale does not 
provide management with very much insight regarding the attitudes of 
employees toward specific policies or practices. 

The second approach consists of asking a number of questions about 
specific aspects of company policy. This type of employee opinion 
survey does provide management with the opportunity to obtain answers 
to those relatively specific questions with which it is often concerned. 
On the other hand, the answers determined from such a survey do not 
give overall attitude scores of the type required if departmental, tenure, 
sex, or other similar comparisons are to be made. 

The present study is an extension of the general employee opinion 
survey approach. Briefly, the questions on the survey are statistically 
treated according to the generally accepted principles of test construction 
and standardization, thus combining the practical merits of the epinien 
survey with the quantitative aspects of the attitude scale. 

We have been speaking of morale as if it were a term the definition 
of which was generally agreed upon. Such, of course, is not the case. 
However, the adequacy of this study will not stand or fall on terminologies 

* This article is based on the authors’ dissertation entitled ““The Development of a 
Quantitative Morale Score from a Generalized Industrial Employee Survey” submitted 
to the Faculty of Purdue University in partial fulfillment of the requirements for the 
degree of Doctor of Philosophy, August, 1948. The dissertation was directed by Dr. 
Joseph Tiffin. 


+ The author is now serving as Research wend Division of Commissioned 
Officers, Public Health Service, Federal Security Agency, Washington, D. C. 


103 





104 Frank J. Harris 


and for our purposes it has seemed unnecessary to go beyond an opera- 
tional definition of morale. The definition adopted therefore is: ‘Morale 
is the attitude of the employee, as expressed on an anonymous question- 
naire, toward the company for which he works, with a favorable attitude 
representing relatively high morale and an unfavorable or neutral attitude 
representing a relatively lower level of morale.” 


Development of the Morale Scale 


Description of the original survey. The data were obtained from a 
survey conducted early in 1948 by the Victor Adding Machine Company 
in Chicago, Ill. The forms were mailed to the home addresses of all 
employees of the company. The forms were returned by the employees 
directly to Purdue University. The individual employees could not be 
identified in any way. The employees had been informed in advance 
of the nature of the project and their cooperation was requested in filling 
out and mailing the forms in an enclosed self-addressed, stamped envelope. 
Approximately 800 questionnaires representing 75% of the employees 
were returned. All of the data were coded and punched on I.B.M. cards 
for more convenient analysis. An analysis of the percentages of em- 
ployees in various categories responding to each alternative was made 
quite independently of this study and forwarded to company officials. 

Initial screening of items. Not all of the items in the original ques- 
tionnaire could be presumed to be directly measuring attitude toward 
management or toward the company. Accordingly 48 questions were 
selected from the total which were considered to be appropriate to 
the study at hand. These 48 items with their alternative responses 
were then reproduced and presented to 10 judges with the following 
instructions: 

The statements below are part of a questionnaire administered to employees 

of a manufacturing company. Kindly check the one response to each 

question which you think most strongly represents a favorable attitude 
toward the company. 

The judges were all advanced students in or professors of industrial 
psychology. It was arbitrarily determined that items on which there 
was 80% agreement or better would be retained at this stage. Forty-six 
items met this criterion. In fact there was unanimous agreement on 
42 items, 90% agreement on one, and 80% agreement on two. On one 
item of the questionnaire, which dealt with the filling of job vacancies, 
there were six possible responses. On this item, seven judges selected 
one response, while the three other judges selected a second response. 
It seemed logically justifiable to retain this item by considering either of 
these two alternatives as favorable. 





Quantification of Industrial Employee Survey. I 105 


Selection of stratified random samples. From the total number of 
questionnaires returned, those on which the respondent had failed to 
answer all of the biographical items were discarded. The remaining 
753 were divided into two groups. These groups were randomly selected 
after the following stratifications had been imposed: male or female; 
married or single; weekly or hourly-paid; worker, set-up man or super- 
visor; length of service. One group, consisting of 377 employees, was 
considered the experimental group; the other, consisting of 376 employees, 
was held out for further analysis at a later stage. 

Item analysis. A key card was prepared on which was punched the 
response to each item which represented ‘“‘high-morale.’’! The cards of 
the experimental group were scored in terms of the total number of high- 
morale responses. The 100 highest scoring employees and the 100 lowest 
scoring employees were selected and the degree of internal consistency 
of each item was determined in terms of discrimination or D-values 
using Lawshe’s nomograph (1). Items having a D-value of 1.0 or 
better were arbitrarily retained to comprise the scale. The 36 items 
which met this criterion, with their respective D-values, and with the 
high-morale response indicated, are presented in Table 1. It will be noted 
that, without any such intention on the part of the author, these items 
embrace many of the factors which various investigators have reported 
to be related to industrial morale. It is also worthy of mention that all 
of the final 36 items are those which the judges had previously agreed 
upon unanimously, provided either of two responses is accepted for 
item 36. 

Reliability of the scale. At this point the experimental group of cards 
upon which the scale had been developed and analyzed was removed from 
further consideration. The second group which had been held out until 
this time was now scored in terms of the 36-item key. The odd-even 
reliability coefficient for this group was determined to be .72; correcting 
by means of the Spearman-Brown formula for the complete scale of 36 
items yielded a reliability coefficient of .84. 


Analysis of Morale Scores 


Once the morale scale had been developed and was found to have 
satisfactory reliability, it was possible to proceed with an analysis of the 
scores of various categories of employees. The results of this analysis 
are shown in Table 2. In all of the comparisons presented the significance 

1In this study, responses chosen by the judges as representing the most favorable 
attitude toward the company are termed “high-morale’”’ responses. As used here the 


term may be considered as equivalent statistically to the term “correct” as it is cus- 
tomarily employed in item analyses of test items. 








106 Frank J. Harris 


of differences was determined for group means and for group standard 
deviations. The significance of meanjdifferences is expressed in terms 
of Fisher’s t statistic; the significance of standard deviation differences 
is expressed in terms of Fisher’s F-ratio as tabled by Snedecor (2). A 
t value which is significant at the 10% level of confidence is indicated 
by an asterisk. A t or F value which is significant at the 5% level is 


Table 1 


Final Morale Scale Items and Discrimination Values 








Item 





What Is Your Opinion of Your Boss 
(the Man You Report to) 
. Does he “know his stuff’’? 
. Does he play favorites? 
. Does he keep his promises? 
. Does he pass the buck? 
. Does he welcome suggestions? 
. Is he a good teacher? 
. Do the workers know more than he does? 
. Does he set a good example? 


Do You Feel You Understand the Following Provisions 
of the Employees’ Security Fund? 


9. How the money is divided among the employees? Yes_x 
10. How the Company decides how much goes to 

this fund? Yes_.x.. 
11. How the Security Fund money is invested? Yes..x.. 
12. How much you get if you leave, die or retire? Yes..x.. 
13. Do you feel that you are receiving considerate 

treatment here? Yes..x..N 
14. Do you feel top management is interested in the 

employees? Yes..x.. 
15. Have you ever recommended this Company as a 

place to work to a friend? 
16. Do you feel you have a good future with this 

Company? 
17. What do you think of working conditions here as 

compared with other plants? 

Above average..x..Average 
. How do you think your average weekly earnings 

(gross earnings before deductions) compare with 

that paid in other companies for the same type 

of work? Better here..x..About the same 


Give Careful Thought to the Following List of Company Policies Affecting Employees, 
Working Conditions, and Enrployee Benefits. Then Check What You Think About 
Each Item as It Is Being Carried Out. 








Quantification of Industrial Employee Survey. I 


Table 1 (Continued) 











Not 
Dislike _ Interested 
9. Group Insurance Plan 
. Security Fund-Profit Sharing 
. Service Pin Awards 
. Vacations 
. Credit Union 
. Chance for promotion 
. Medical Department 
. Cafeteria 
. Lockers 
. Suggestion System 
. Employee Committees 
. Do you find your fellow workers: 
Friendly..x..Unfriendly 
. What does your family think of this Company? 
Good place to work..x..No opinion 
. How do you like your present job? 
Very much..x.. Not so good 
Pretty good Don’t like it 
. Do you think the employees have confidence in the operating 
heads of the business? 
Most employees do..x.. 
About half 
. How do you feel your opportunities in this Company compare 
with those with your last employer? 
Better..x..Not so good 
Never worked elsewhere 
. What are your work plans for the future? 
Hope to remain here..x..Plan to work only a short time 
Do not plan to work......I have other work plans 
. When desirable job vacancies arise, how do you feel they are 
generally filled? 
By both ability and service 
By employing people outside the Company 
By promoting favored employees who are 
not especially qualified 
By giving first chance to employees of long 
service 
By taking the most qualified person 
I am not sure how they are filled 





indicated by two asterisks; at the 1% level by three asterisks. These 
asterisks are inserted for the convenience of the reader. 
Sex differences. It will be noted that although the mean morale scores 





108 Frank J. Harris 


of the sexes do not differ significantly, the men are significantly more 
variable than the women. 

Marital status. All married employees combined yield a significantly 
higher mean morale score than all single employees combined. In an 
attempt to determine whether either sex might account for this difference, 
a further breakdown was made. It may be seen that differences between 
the married and the single may be attributed primarily to the significantly 
higher scores obtained by married men as compared with single men. 


Table 2 


Comparisons of Morale Scores Among Employee Sub-Groups 








N Mean .D. t 





Sex 


Male 26.99 
Female 26.91 


Marital status 

Married 

Single 5. 

Married men 7.66 MM vs SM 

Single men 5 25.3: MW vs SW 

Married women 27.36 MM vs MW 
Single women 26.35 =. SM vs SW 


Type of job 

Worker 8 W vsS 
Supervisor § W vs 8-U 
Set-up man 34 § S vs S-U 
Method of pay 

Weekly 98 

Hourly 278 

Weekly-paid worker 53 

Hourly-paid worker 188 

Weekly-paid supervisor 43 

Hourly-paid supervisor 58 


1.16 
1.86** 


Length of service 
Under 6 months 50 222** 1.64** 
6 mos. to 1 year 86 aie ' 
yearn ’ 87 : 1.66 1.09 
sale, scat éd on 132 1.32 
2 to 5 years 65 d . 
. . 1.21 1.26 
5 to 10 years 62 : 5. 3.14*** 1.05 
Over 10 years 26 65 ; sisi 





* Significant at the 10% level. 
** Significant at the 5% level. 
*** Significant at the 1% level. 





Quantification of Industrial Employee Survey. I 109 


Marital status does not appear to affect the morale scores of women 
employees significantly. It is also evident that the greater homogeneity 
of women’s scores is due more to the single than to the married women. 

Differences in type of job. Employees were asked to indicate whether 
their job was best classified as that of a worker, supervisor, or set-up 
man. Comparative morale scores were determined for these three general 
categories. None of the differences between these groups is significantly 
greater than chance alone could reasonably explain, contrary to what 
one might expect on the basis of previously reported findings. 

Weekly vs. hourly-paid jobs. The scores of all employees who were 
on a weekly salary were compared with the scores of all employees who 
were paid on an hourly rate basis. Since there was a tendency for weekly- 
paid employees to score higher than hourly-paid employees, further anal- 
yses of the data were made to determine whether workers or supervisors 
might account for this trend.2 The results of this analysis indicate that 
the weekly-paid supervisors account for the higher morale scores of 
weekly-paid employees in general. Also, as a group, the scores of weekly- 
paid supervisors are more homogeneous than the scores of any com- 
parable group. 

Length of service. Employees were asked to indicate whether they 
had worked for the company (1) under six months, (2) from six months 
to one year, (3) from one to two years, (4) from two to five years, (5) from 


five to 10 years, or (6) over 10 years. Scores were analyzed in terms of 
these six categories. The results indicate that morale scores are lowest 
and most heterogeneous under six months. From six months to 10 
years they appear to fluctuate to an insignificant extent. After 10 years 
they again take a significant swing upwards. 


Summary and Conclusions 


An attempt was made to develop a quantitative morale scale by 
treating responses to an industrial employee survey according to standard 
test development procedures. A questionnaire containing specific items 
of interest to management was filled out anonymously by approximately 
75% of the employees of a Midwestern manufacturing company and 
mailed directly by each employee to Purdue University. 

Questions which were obviously related to morale were judged by 
10 competent individuals in terms of the one alternative response which 
represented a favorable attitude toward the company. The 46 items 
upon which there was 80% or higher agreement constituted the original 
scale. 


2 Set-up men were not included in this analysis since 32 of the 34 employees in this 
category were weekly-paid. 








110 Frank J. Harris 


The questionnaires were then separated into two stratified random 
samples containing 377 and 376 cases respectively. One of these samples 
was scored and a high and low group of 100 cases each, based on total 
score, were selected. The per cent of the high scoring group and the 
per cent of the low scoring group responding to each item was determined. 
From these percentages the discrimination value of each item was com- 
puted. The 36 items having D-values of 1.0 or higher constituted the 
final scale. The sample which had been held out was scored in terms of 
the 36-item key. A corrected reliability coefficient of .84 was obtained 
by the split-half (odd-even) method. 

The results of an analysis of the morale scores of various employee 
sub-groups are presented. 

It should be pointed out and emphasized that the results of the anal- 
yses reported here are specific to the data from the particular company 
which cooperated in the study. Any attempt to generalize from these 
results as to the relative levels of morale among various groups of in- 
dustrial employees would be hazardous. 

No attempt has been made to explain the differences or lack of dif- 
ferences found. Such differences can be most safely interpreted by 
individuals thoroughly familiar with the plant from which the data were 
obtained. The results of the survey give such individuals a clue as to 
the focal points of the industrial relations program which might possibly 
call for special attention. 

The methodology used in developing the scale may, on the other 
hand, be profitably applied to any industrial situation where data of the 
type described are available. The advantages accruing to any given 
company from this approach are several. In addition to the types of 
information usually obtained from an employee survey of this sort it 
becomes possible to: 


1. Obtain a reliable and quantitative estimate of the relative morale 
levels of various groups of employees such as workers and supervisors, 
old and new employees, married and single employees. 

2. Obtain a reliable indication of those areas in which a change in 
policy would seem desirable. 

3. Secure comparable data with which to compare the state of morale 
from time to time and thus reflect the effect of any changes introduced 
by management. 

4. Accomplish the above with little more effort than is involved in 
the treatment of the ordinary attitude survey. 


Much more could be accomplished by further extensions of this 
approach. Working cooperatively through a common consultant a 





Quantification of Industrial Employee Survey. I 111 


number of companies would be able to obtain an indication of the level 
of morale of their employees as compared with employees offother com- 
panies. If it were possible also to relate the morale score of the worker 
to the supervisor under whom he works, industry would be better able to 
deal with one of the major sources of differences in employee attitude. 
It is hoped that the technique reported here will lead to further investiga- 
tions along these lines. 


Received September 23, 1948. 


References 


1. Lawshe, C. H., Jr. A nomograph for estimating the validity of test items. J. appl. 
Psychol., 1942, 26, 846-849. 

2. Snedecor, G. W. Statistical methods. Ames, Iowa: Iowa State College Press, 1946. 
485 pp. 

3. Thurstone, L. L., and Chave, E. J. The measurement of attitude. University of 
Chicago Press, 1929. 



































The Quantification of an Industrial Employee Survey. 
II. Application * 





Frank J. Harris ¢ 


Division of Education and Applied Psychology, Purdue University 


In a previous paper,! the author described a technique for developing 
a quantitative morale score by applying the principles and methods of 
test construction to an industrial employee survey. Advantages claimed 
for this approach are that the employer can secure comparable data with 
which to compare the state of morale from time to time, can obtain a 
reliable indication of those specific areas in which a change of policy might 
seem desirable, and is provided with a measure of the effect of any 
changes that are instituted. The present paper attempts to illustrate 
these advantages. | 

The survey from which the morale scale was developed was conducted 
in 1948. A similar survey had been conducted in 1945 for the same com- 
pany, by the same consultant, and in the same manner. Of the 36 items 
selected from the 1948 survey to constitute the morale scale, 35 had 
appeared on the 1945 survey with minor modifications in wording in a 
few instances. In the earlier survey, 555 or 65% of the employees re- 
turned the questionnaire. Of the total respondents 60% were men and 
40% were women. In the later survey, 800 or 75°% of the employees 
responded of whom 66% were men and 34% women. Thus the two 
groups can be expected to be reasonably comparable in sex ratio and 
employee representation. 

- The results of the two surveys were examined to determine the direc- 
tion and extent of any changes which might have occurred in the inter- 
vening three year period. A comparative study of this sort could be 
made in at least two general ways, depending upon the type of infor- 
mation desired. One way would be to score both sets of questionnaires 
in terms of the 36-item key. From the obtained scores it would be possi- 


* This article is based on the author’s dissertation entitled ““The Development of a 
Quantitative Morale Score from a Generalized Industrial Employee Survey” submitted 
to the Faculty of Purdue University in partial fulfillment of the requirements for the 
degree of Doctor of Philosophy, August, 1948. The dissertation was directed by Dr. 
Joseph Tiffin. 

+ The author is now serving as Research Psychologist, Division of Commissioned 
Officers, Public Health Service, Federal Security Agency, Washington, D. C. 

1 Harris, F. J. The quantification of an industrial employee survey. I. Method. 
J. appl. Psychol., 1949, 33, 103-111. 


112 


Quantification of Industrial Employee Survey. II 113 


ble to compare the morale of employee sub-groups at the earlier and at 
the later time. Another way would be to compare the responses to each 
item by determining the per cent of employees who responded favorably 
to the item at each administration of the questionnaire form. The latter 
type of analysis was made in this study; for each item the difference was 
determined between the per cent of respondents who indicated a favorable 
response in 1945 and the per cent who indicated a favorable response in 
1948. The level of significance of the differences between these per- 
centages was determined by the computation of t-values. 

The principal findings are summarized as follows? 

1. On 19 of the 35 items there was a shift in the high-morale or favor- 
able direction at the 1% level of significance or better. 

2. There was a favorable shift on three items at the 2% to 10% level 
of significance. 

3. Only one item, ‘Does your boss play favorites?”’, showed an un- 
favorable change in attitude (at the 4°% level of significance). 

4. The remaining 12 items revealed slight changes in either direction 
which could be explained readily on the basis of chance alone. 


In addition to changes in attitudes toward certain items an estimate 
was obtained of the general level of attitude toward each item or policy 
represented thereby. For example, although there was a markedly 
favorable shift in attitude toward comparative weekly earnings (from 12% 


responding favorably in 1945 to 37% responding favorably in 1948) the 
“level” of attitude remained rather low. On the other hand, 90% of the 
employees liked the group insurance plan in 1945 and in 1948. 

The final interpretation of the findings and the uses to be made of 
them rest on decisions of the sponsoring company. The changes in morale 
or attitude were undoubtedly influenced to some extent by factors ex- 
ternal to the company, e.g. conversion from war to peacetime production. 
However, management now has reliable indices of how certain of its 
practices have been received by the employees, has some definite clues 
as to what effects its policy changes have had on morale, and is in a better 
position to chart its future course in personnel relations. 


Received September 23, 1948. 


2 Complete data are on file in the Purdue University Library and in order to reduce 
printing costs a summary prepared in table form has been deposited with the American 
Documentation Institute. Order Document 2625 from American Documentation 
Institute, 1719 N St., N.W., Washington 6, D. C., remitting $.50 for microfilm (images 
1 inch high on standard 35 mm. motion picture film) or $.50 for photocopies (6 by 8 
inches) readable without optical aid. 





“Item Analysis” Versus ‘‘Scale Analysis’ * 
Philip H. Kriedt and Kenneth E. Clark 


University of Minnesota 


In the last few years Dr. Louis Guttman of Cornell University has 
developed a new and increasingly popular technique for determining 
whether or not a test or attitude scale possesses unidimensionality (8). 
This paper presents a comparison of this technique of scale analysis with 
two older methods of item analysis, in order to determine the comparative 
values of each method for selecting from a pool of items those which 
belong together, either in terms of their internal consistency, or in terms 
of their unidimensionality. 

The three methods herein compared are: (1) the Cornell Technique 
of Scale Analysis, in which the essential statistic is reproducibility, and in 
which emphasis is placed on the ability to predict or “reproduce’”’ the 
response of an individual to every item of a scale in terms of his total 
score on that scale; (2) one common form of item analysis, specifically, 
that in which the item-responses made by persons in the top twenty-seven 
per cent of the distribution on total score are compared with the re- 
sponses made by persons in the bottom twenty-seven per cent on total 
score, using the phi coefficient as a measure of the correlation between 
item and total score; and (3) the determination of inter-correlations be- 
tween items as a means of selecting those which are measuring the same 
thing, using as the measure of relationship the tetrachoric correlation 
coefficient. 

A 72-item Likert-type questionnaire on attitudes toward Negroes, 
made up of items with 3, 5, and 7 categories of response, was administered 
to 183 students in an elementary course in Social Science at the University 
of Minnesota. In general, the content of these items was extremely 
heterogeneous. ‘The scale was scored initially by assigning arbitrary unit 
values to each of the response categories, as in the usual Likert-type scale. 

Since the analytic methods being compared would be affected con- 
siderably by the methods used to reduce all item responses to dichotomies, 
some preliminary work was done to determine how best to group item 
responses. Response categories were first combined so as to maximize 
the per cent reproducibility of each item and, whenever possible, to make 


* The writers are indebted to the University of Minnesota Graduate School for the 
research grant that made this study possible. 


114 





“Ttem Analysis” Versus “Scale Analysis” 115 


each category have less error than non-error, in accordance with the re- 
quirements of the Scale Analysis methods. However, this approach did 
not seem particularly promising for the development of a good scale, since 
dichotomizing items so as to maximize reproducibility, without regard 
for other item characteristics, tends to provide dichotomies with high 
modal response frequencies; that is, items which are answered the same 
way by a large proportion of the respondents. Items were also dichoto- 
mized, therefore, on the basis of item correlation with total score. The 
top 27 per cent and the bottom 27 per cent on total score were selected, 
and their responses to every possible dichotomy of response categories 
were compared, using phi coefficients computed using Jurgensen’s tables 
(9). That combination which maximized the phi coefficient was used. 
These dichotomies had few high modal response frequencies, since the 
method used tends to penalize items deviating markedly from a 50-50 
split. Ten items were found to have such low phi coefficients for any 
combination of responses that they were not included in the later analyses. 

For the remaining 62 items, all of which were now dichotomized, the 
following computations were made: all inter-item correlations, using the 
Cheshire, Saffir, and Thurstone computing diagrams (1) (method A); 
phi coefficients for each item versus total score on the 62 item dichoto- 
mized scale (method B); and the per cent reproducibility of each item as 
a part of the 62-item scale (method C). In addition, all seventy-two 
items were carefully read by the writers, and twenty-seven items selected 
as representing, in the judgment of the writers, the primary factor being 
measured by the scale. All questionnaires were rescored for these twenty- 
seven items, and reproducibilities computed for each of the items using 
this new total score (method D). 

Four separate bases thus existed for the selection of items for a 
shorter, more unified scale. Using each method, a ten-item scale was 
constructed. These four 10-item scales will be referred to hereafter as 
scales A, B, C, and D. Scale A was made by selecting the ten items 
whose intercorrelations with each other would be maximized; scale B was 
made up of items with the highest correlation with the total score; scale 
C consisted of the 10 items having highest reproducibility in the 62-item 
computation; and scale D the 10 items having highest reproducibility 
selected from the special group of 27 items. The items in the two 
“reproducibility” scales (scales C and D) had no more error than non- 
error in each category, as required by the Guttman method. These two 
scales were almost identical, having eight of their ten items in common. 
None of the other scales, however, had more than three items in common. 

A statistical description of each of these four scales is presented in 
Table 1. For each scale is reported: (1) reproducibility when rescored 





we 
. 
SS 
iS) 
<a) 
s 
> 
& 
= 
Ss 
x 
3 
= 
8 
Ss 
) 
= 
x 
x 
a> 
~ 
™= 
<= 
Q 





L49 uvaT 
Ol N 


6S-0¢ 
69-09 
62-02 
68-08 
66-06 
a Vv 
a[Bog 


uBoyN uBayy 
N N 


be-OF 
6F-SP 
¥S-0S 
69-GS 
9-09 
69-S9 I 
FL-02 I 
6L-CL ¥ 
P8-08 € 
Oo a V a Vv 
S 


a[BOG a[Bo 


c8 88 uBayy 
Ol Ol N 


a 
a[Bog 





sotuenbaly 
asuodsay [Bpojy 
jo suOIyNUysIq] 


8,4 UWld}]-10,UT UBIPOTY 
jo SUOTINGLIYSICT 


(91099 [BJO], “SA U194]) 
SPUBIOWJIO) Wd 
jo suolynquysiq, 


sarpiqrnpoidayy 
% jo suoIyNqLysiq 


- 








(I 989g) suUId}T ZZ JO [oog Busey, Ayyiqrnpoiday 


pus (9 opvog) suey] ZQ JO [00g Buisy Ayyiqwnpoiday ‘(gq a[BIg) a100g [RIOT 


puv Wd} UIMJog UOIPBIIIIOD (YY BBG) SUOT}BIAIIOD UI9}]-19}U] :UO SUD}T 4SOG] UAT, JO SoljSIIaJOBIBYH a[Bog PUB WIA}] 


T 19%, 





“Item Analysis’? Versus “‘Scale Analysis” 117 


for the ten-item scale; (2) phi coefficient indicating correlation between 
item response and total score for the ten-item scale (using top 27 per cent 
against bottom 27 per cent); (3) the median inter-item correlation be- 
tween an item and the other nine items in the scale, using tetrachoric 
r’s; and (4) the modal response frequency of items (i.e., the percentage 
of respondents who answered the item in the same way). The odd-even 
reliability (estimated from the Spearman-Brown prophecy formula) is as 
follows for each of the ten-item scales: A, +.90; B, +.91; C, +.83; and 
D, +.86. 


Results 


The relative merits of each of the three methods of item selection and 
scale refinement are discussed below in terms of the data presented. 

Scale A (Inter-Item Tetrachoric Correlations). The selection of ten 
items from the pool of 62 items so as to maximize the median inter-item 
tetrachoric correlation coefficient produced a scale which does not quite 
meet Guttman’s criterion of 90 per cent reproducibility (see Table 1). 
However, this scale does compare favorably with the other scales when 
examination is made of the phi coefficients for each item versus the total 
score, and of the modal response frequencies of the ten items. Thus, 
the use of this method of item selection yields a relatively good scale in 
spite of the fact that the tetrachoric r is not an appropriate statistic to 
use with data of this kind. Hada more appropriate measure of correla- 
tion been used, one would assume that this method for selecting items 
would have yielded the best scale of the four. For the writers to have 
used another statistic would have made the labor and expense of com- 
putation with a matrix of 62 items prohibitive. The writers therefore 
fell back on the same solution used by many others and resorted to the 
Thurstone et al., tetrachoric computing diagrams as a ready, and reason- 
ably approximate estimate of the relationship between items. Some 
items, however, have extreme response splits, so that the value of r could 
only be estimated, or became a meaningless value of plus or minus 1.0. 

Scale B (Top versus Bottom 27 Per Cent). Comparing the item re- 
sponses of extreme top and bottom groups fails to work as a method of 
producing a scale having unidimensionality, as defined by Guttman. It 
does, however, produce a scale having high internal consistency as meas- 
ured by the odd-even reliability coefficient (.91), or median item ve.sus 
total score phi coefficient (.79). Its items, moreover, discriminate well 


1 A discussion of the disadvantages of the use of tetrachoric correlations with attitude 
scale items is discussed in Gage (7). That tetrachoric r’s give different values than are 
obtained with other statistics was demonstrated empirically for one ten-item matrix. 
Greatest discrepancies occur when the modal response frequency approaches 100 per cent. 








118 Philip H. Kriedt and Kenneth E. Clark 


over a wide range, being more satisfactory in this respect than the items 
producing higher reproducibilities. This method has the additional 
advantage of being less laborious, and of involving less judgment and 
more mechanical selection of items, than the Guttman methods. 

Scales C and D (Reproducibility). If one accepts Guttman’s definition 
of unidimensionality of a scale, one requires among other things that the 
scale have a per cent reproducibility of 90 per cent or more. The only 
scales which meet this requirement are scales C and D. Furthermore, 
practically the same‘results were obtained when items were selected from 
a pool of 62 heterogeneous items as when from a much more homogeneous 
group of 27 items. These results obtain even though the Cornell Tech- 
nique of Secale Analysis is not designed primarily as a method of item 
analysis and item selection, and even though it is not intended to be used 
in the mechanical fashion in which it was used in this study. 

The use of scale analysis for selecting items does have some dis- 
advantages, however, in terms of the response distributions of items. 
Seales C and D selected items which were answered the same way by a 
large proportion of the respondents (80.2 per cent and 81.6 per cent). 
Moreover, scales C and D are inferior to the other two scales in terms 
of odd-even reliability. 


Discussion 


It has been the purpose of the present paper to present the results 
of an application of the Cornell Technique of Scale Analysis to an attitude 
scale in order to compare its workings with those of two methods which 
have been heretofore considered appropriate for scale refinement. There 
are certain side issues which come up in the use of the Cornell Technique 
which complicate its use in such circumstances. The chief obstacle is 
that one must consider several features of an item at the same time in 
manipulating data for analysis. For instance, the fewer response cate- 
gories an item has, the easier it will be to ‘‘reproduce”’ that item’s re- 
sponse, knowing an individual’s total score on the scale. A scale made 
up of dichotomized items thus has higher reproducibility than a scale with 
the same items with three or more responses. Also, the prediction of an 
individual’s response to a particular item can be made with greater 
accuracy if a very high percentage of the total group answer that item 
in the same way. A scale made up of only very popular and very 
unpopular items will, therefore, have higher reproducibility than one 
made up of items of varying degrees of popularity. To avoid spuriously 
high reproducibility resulting from many items of this sort, Guttman has 
set up the requirement that no category have more error in it than non- 
error. Thus when 90 per cent agree with an item in a scale, we must 





“Item Analysis” Versus “Scale Analysis’’ 119 


predict correctly, half of the time, from total score, not only who the 
90 per cent are who say agree, but who the 10 per cent are who disagree. 

It is difficult to process one’s data keeping these various requirements 
in mind. (One wishes that the mechanics of scale analysis could receive 
the sort of synthesis and organization which the Wherry-Doolittle method 
provides in solving the problems of multiple regression.) In addition, 
one finds that the safeguards invented by Guttman occasionally permit 
worthless items to remain in the pool. Most serious weakness is in the 
more-error-than-non-error-per-category rule. It is possible to have an 
item with 99 per cent reproducibility which meets this rule, which is 
nonetheless worthless in that it has zero relation to the total score on 
the scale. If all but two persons agree with an item, and one of these 
has the highest total score and the other the lowest total score, then the 
item is valueless but meets the requirements Guttman sets forth. 

Guttman’s techniques cannot be used easily by research workers who 
hdve not had considerable experience with them.2 Much judgment 
must be exercised in the combining of response categories and in balancing 
the several criteria of unidimensionality which Guttman has developed 
(reproducibility, more error than non-error in each category, items 
selected at various intervals along the range of modal response fre- 
quencies). Special care must be taken to avoid the selection of too many 
items with high modal response frequencies, since such items, while 
having high per cent reproducibilities, tend to have low reliability and low 
discriminating power. 

Thus in one sense, Guttman’s approach is a less satisfactory approach 
to the problems of scale refinement than the traditional methods. The 
worker is required to judge, first of all, whether or not items can logically 
be considered to belong together. He must then scrutinize the pattern 
of responses of individuals to each of these selected items in terms of 
total scores on the scale and decide how best to combine item response 
categories in order to improve the scale. Throughout the entire analysis, 
there are no rigorous tests applied to determine which of several methods 
will work best. In fact, the worker must keep in mind several different 
item characteristics while he works. 

In spite of the mechanical difficulties of scale analysis, however, the 
writers find it a valuable and useful technique. The judgmental processes 
mentioned above do have the beneficial effect of compelling the in- 
vestigator to become better acquainted with the data with which he 

? Edwards (5) has shown that even Guttman’s own published data may be reworked 
to yield different results than originally reported. 
3 Edwards and Kilpatrick (6) have described a method more precise than Guttman’s 


for selecting items which will constitute a unidimensional scale using Guttman’s criteria 
of unidimensionality. 





120 Philip H. Kriedt and Kenneth E. Clark 


works. The forcing of judgments on the worker constantly takes him 
back to the data themselves and this is highly desirable. Moreover, there 
are advantages in predicting a response from total score instead of the 
reverse, and in predicting from the total score instead of predicting the 
response to one item from the response to another item. Consider a 
scale which obviously has perfect unidimensionality; for instance, the 
questions: Are you over 10 years old? Are you over 20 years old? Are 
you over 30 years old?, etc. Knowing the total score, the responses to 
every itern can be reproduced with perfection. Knowing the response 
to only one item, one may or may not be able to predict the responses to 
all of the other items, and one cannot, therefore, always predict the total 
score without error. High reproducibility, therefore, has more meaning 
in defining the unidimensionality of a scale than either high item-versus- 
total score correlations or high item-versus-item correlations (4). 
Finally, one must avoid thinking of scalability, as defined by Guttman, 
as a “good” characteristic of a series of items and of non-scalability as a 
“bad” characteristic. The use which is to be made of the series of items 
must always be considered. If the measure in question is to be used as a 
predictor variable for instance, scalability may be irrelevant or even 
undesirable. If the measure is to be used in a study of mental or per- 
sonality organization (perhaps as a measure of what Gordon Allport has 
called a “common trait’’), it should represent but one dimension, and, 


hence, should be scalable. The measurement of public opinion also 
makes profitable use of scales having high reproducibility.‘ 

In summary, the writers feel that Guttman’s new scale analysis 
techniques can prove to be very useful in problems of psychological 
measurement.’ Considerable discretion must be exercised, however, 
both in the selection of suitable problems to which these methods may 
be applied and in the way the methods themselves are handled. 


Received September 4, 1948. 


References | 


1. Cheshire, L., Saffir, M., and Thurstone, L. L. Computing diagrams for the tetrachoric 
correlation coefficient. University of Chicago Book Store, Chicago, 1933. 

2. Clark, K. E., and Kriedt, P. H. An application of Guttman’s new scaling tech- 
niques to an attitude questionnaire. Educ. psychol. Measmt., 1948, 8, 215-224. 


‘For a further discussion of the importance of scale reproducibility, see Coomb’s 
analysis of the “trait status’ score (3). 

5 In the present article the writers have attempted to call attention to the advantages 
and disadvantages of scale analysis methods in terms of the results obtained when these 
methods are used. That these methods do not, in practice, live up to the promise they 
show in theoretical terms may be due in part to the way in which scale analysis is done. 
For a discussion of this point see Clark and Kriedt (2). 





“Item Analysis” Versus ‘‘Scale Analysis’ 121 


. Coombs, C. H. Some hypotheses for the analysis of qualitative variables. Psychol. 
Rev., 1948, 55, 167-174. 

. Dodd, 8. C. A simple test for predicting opinions from their subclasses. Int. J. 
Opin. Attitude Res., 1948, 2, 1-25. 

. Edwards, A. L. On Guttman’s scale analysis. Educ. psychol. Measmt., 1948, 8, 
313-318. 

. Edwards, A. L., and Kilpatrick, F. P. A technique for the construction of attitude 
scales. J. appl. Psychol., 1948, 32, 374-384. 

. Gage, N. L. Scaling and factorial design in opinion poll analysis. Purdue Univ. 
Studies in Higher Educ. LXI, 1947. pp. vi + 87. 

. Guttman, L. The Cornell technique for scale and intensity analysis. Lduc. psychol. 
Measmt., 1947, 7, 247-280. 

. Jurgensen, C. E. Table for determining phi coefficients. Psychometrika, 1947, 12, 
17-29. 





The Airline Pilot’s Job * 


Thomas Gordon 
American Institute for Research, Pittsburgh, Pa. 


It is the purpose of this paper to report certain aspects of a study 
conducted by the Aviation Branch of the American Institute for Research 
under the auspices of the National Research Council Committee on 
Aviation Psychology.!. Funds for the project were furnished by the 
Civil Aeronautics Administration. This study, completed in November, 
1947, was undertaken (1) to study current methods of selecting and 
evaluating the airline pilot and (2) to determine the critical requirements 
of his job. It was intended that the data obtained in this investigation 
be used as a basis upon which to develop improved procedures for 
selecting, training, and certifying airline pilots. At present the American 
Institute for Research is utilizing the data as a basis for devising a 
radically new type of flight examination for pilots seeking the Airline 
Transport Rating certificate. This latter project is under the same 
sponsorship as the study to be described in this paper. 

In the first phase of the study the general procedure followed was to 
survey the available sources of information pertaining to present methods 
of selecting and evaluating airline pilots. In the second phase of the 
project the procedure was to survey sources of information about the 
critical requirements of the airline pilot’s job, an attempt being made 
to answer the question: ‘‘What behavior and characteristics are re- 
quired for handling the job safely and effectively?” 


Methods of Selecting the Airline Pilot 


Methods of selecting applicants for the job of airline pilot were studied 
by examining the personnel records of 432 pilots from five major airline 
companies. The technique employed was to obtain the records of pilots 
who had been released by their companies because of lack of flying pro- 
ficiency during the period between initial hiring and the time when they 


* Parts of this paper were read at the Meeting of the Aero-Medical Association in 
Toronto, Canada, on June 17, 1948. The author’s report of the entire study has been 
published as Research Report No. 73 by the Civil Aeronautics Administration, Division 
of Research, Washington, D. C. (3). 

1 The writer is indebted to the members of this committee and to John C. Flanagan 
for their guidance during the study and to the members of the Aviation Branch of the 
American Institute for Research who assisted in conducting the study. 


122 





The Airline Pilot’s Job 123 


would have qualified as an airline captain. These pilots constituted the 
experimental group (E-group). Then the records were obtained on a 
number of pilots who had not been eliminated but were currently em- 
ployed. These pilots constituted the control group (C-group). The two 
groups were matched on the basis of time of original hiring by the com- 
pany. Adequate data were available for both the experimental and 
control groups on eight variables of the type currently established by 
airline companies as selection requirements for pilot applicants. These 
variables were: (1) Age at time of hiring; (2) Previous education; (3) 
Otis Test I1.Q. scores; (4) Bennett Test of Mechanical Comprehension 
(Form AA) scores; (5) Minnesota Multiphasic Personality Inventory 
scores; (6) Previous flying hours; (7) Marital status; and (8) Previous 
ground training in aeronautical subjects. 

The experimental group and the control group were compared on 
each of these variables. Data were not available for all of the pilots in 
each group on each separate variable. The findings, summarized in 
Table 1, show that the difference between the group of eliminated pilots 
(E-group) and the group of successful pilots (C-group) on no one of the 
eight variables was statistically significant even at the 5% level of signifi- 
cance. These results indicate rather conclusively that present require- 
ments established by airline companies for selection of applicants are not 
adequate for predicting later success or failure with much confidence. 
Furthermore, because none of the selection procedures differentiated 
between eliminated and successful pilots it was not possible to derive 
from these procedures any clues as to the critical requirements of the 
pilot’s job. 


Methods of Evaluating the Airline Pilot 


The methods and procedures used by airlines for evaluating their 
pilots also were surveyed in this study. Information pertaining to 
evaluation procedures was obtained primarily through examination of 
company records of the flight performance and ground school achievement 
of both eliminated and currently employed pilots and through scrutiny 
of the flight examinations used by airlines. Pilots’ and check-pilots’ 
attitudes toward present methods of evaluation and their suggestions for 
improvement were obtained through individual interviews. The findings 
in regard to methods of evaluation currently used by airline companies 
can be summarized as follows: 


1. There exists a great amount of variation between airline companies 
as to the adequacy of the training records maintained on their pilots. 
There were practically no records of flight tests in the files of some of the 
pilots. 





Thomas Gordon 


Table 1 


Comparison of Eliminated and Successful Airline Pilot Trainees on Selection 
Requirements Established by Airline Companies 








Number of 
Pilots Standard 
Sanaa Mean Error 
E- C- Difference of 
Selection Requirements group group (CminusE) Difference t-ratio* 





. Age at Time of Employment 169 166 — .65 yrs. 50 1.297 
. Amount of Education at Time 
of Employment Beyond High 
School 170 18 yrs. 24 .738 
. Otis 1.Q.’s 63 2.40 1.36 1.762 
. Bennett Test of Mechanical 
Comprehension Scores 
(Form AA) : 7.24 848 
. Minnesota Multiphasic Per- 
sonality Inventory Scores: 
Lie Scale 16 16 j 1.77 566 
Validity Scale 15 15 ‘ 1.13 .294 
Hypochondriasis Scale 15 15 ‘ 1.49 .222 
Aggression Scale 16 16 i 2.91 514 
Hysteria Scale 18 18 : 1.81 .553 
Psychopath. Deviate Scale 15 15 y 3.07 1.130 
Interest Scale 17 17 : 3.03 564 
Paranoia Scale 17 17 1.65 .780 
Psychasthenia Scale 15 15 i 2.29 .000 
Schizophrenia Scale 15 15 87 1.60 544 
Hypomania Scale 16 16 2.81 1.44 1.957 
. Number of Previous Flying 
Hours 165 171 14.7 hrs. 128.92 113 
. Marital Status 170 168 (117 married in E-group, 119 in 
C-group) 
. Previous Training in Aero- 
nautical Subjects 214 214 (134 with previous training in 
both E-group and C-group) 





*None of the mean differences were statistically significant at the 5% level of 
significance. 


2. All of the airlines rely upon periodic flight examinations, called 
flight-checks, for obtaining evaluations of their pilots. The maneuvers 
that make up the flight-check vary from one airline to another and vary 
somewhat within a single airline from one check-pilot to another. 

3. In general, on these flight-checks a pilot is rated against the 
standard set up by the particular check-pilot rather than against an 
objective standard. For example, pilots are usually rated on a scale, 
such as ‘‘Standard-Substandard,” ‘‘Good-Average-Below Average,”’ or 





The Airline Pilot’s Job 125 


**1-2-3-4-5.”’ For a few maneuvers, however, some airlines have tried to 
establish more objective standards in the form of “limits” for altitude, 
airspeed or heading, within which the examinee is required to keep the 
plane in order to achieve a passing rating. Even the application of these 
limits was found to vary among check-pilots within a single airline. 

4. The “‘halo effect’”’ was found to be operating in the ratings of flight 
performance. From examination of the records of past flight-checks, it 
was found that when a pilot received a below-average rating on one 
maneuver there was a very strong tendency for him to get below-average 
ratings on all subsequent maneuvers. The ratings did not differentiate 
between the pilots’ strengths and weaknesses. In other words, they are 
not useful as diagnostic performance records. One explanation of the 
lack of discrimination of the ratings on different maneuvers is the fact 
that it is common practice to make no record of the pilot’s performance 
during the flight. The flight-test forms are usually filled out after the 
flight, thus increasing the chance that the quality of a pilot’s performance 
on specific maneuvers might be forgotten. 

The discovery of the inadequacy of records of performance, the lack 
of standardized evaluation procedures and the subjectivity of the meas- 
ures of proficiency parallels the findings of Army Air Force psychologists 
whose research in the area of pilot proficiency measures is reported in the 
volume edited by Miller (5). Similar findings have been reported 
in a study of the flight-checks used by the Civil Aeronautics Adminis- 
tration (1). 


The Critical Requirements of the Job 


A second objective of the study was to determine the critical require- 
ments of the job of airline pilot. This involved an analysis of the job 
with particular emphasis upon isolating those job requirements which 
are the most critical. In this approach, “critical requirements” are 
defined as those job requirements, expressed in behavioral terms, which 
have proved to be important factors in differentiating successful or un- 
successful performance on the job. The assumption underlying this 
approach is that the most critical differences between the safe and effective 
pilot and the one who is not will be revealed by focusing the job analysis 
upon situations where the behavior of pilots has been shown to make a 
difference. ‘To use a common expression of pilots, the critical require- 
ment approach attempts to determine ‘‘what separates the men from 
the boys.” Flanagan (2) has described the use of this job analysis 
method in the study of causes of mission failures in the Army Air Forces, 
and he has stated that such a determination of critical requirements is 
the principal objective of job analysis procedures. 








126 Thomas Gordon 


Although this approach resembles other methods of job analysis, it 
also differs from them in an important respect. It is common practice 
in most job analysis approaches to collect long lists of job requirements, 
after which it is necessary to submit them to experts (usually psycholo- 
gists, supervisors, top management) for judgments of their relative im- 
portance for success on the job. The critical requirement approach, 
however, yields at the outset only the critical requirements, and it relies 
more upon the participants on the job for judgments of what is critical 
or upon actual records of situations where behavior has been critical. 
For example, in this study we relied upon the following sources of infor- 
mation: 


1. An analysis was made of the records of all scheduled domestic airline 
accidents, during the period of 1938 through 1946, in which the behavior 
of the pilot was judged a contributing factor in the accident. From each 
of 121 such accident reports we extracted a description of the specific 
behavior of the pilot prior to and during the accident and the circum- 
stances leading up to the accident. 

2. Interviews were conducted with airline pilots and check-pilots for 
the purpose of obtaining a larger sample of critical incidents than was 
provided by the accident reports. Questions were devised which would 
yield examples of critical situations rather than commonplace or every- 


day occurrences. This “critical incident technique” required the pilots 
to recall recent events or incidents in which they did something which 
created an unsafe situation, thus minimizing discussions of traits or 
stereotyped opinions as to the requirements of the job. Examples of 
“critical incident’”’ questions used are: 


a. ‘Probably all pilots who have flown a lot have done something at one 
time or another that got them into an uncomfortable situation or even a near- 
accident. We would like to get several examples of such things you have 
done. First, could you describe the most recent situation in which you did 
something like this and tell me just what you did?” 

b. “‘Now, I would like for you to recall the last time you had to take over 
the controls from a co-pilot because you felt the situation was pretty critical. 
Could you describe that situation and tell me just what the co-pilot did or 
might have done if you hadn’t taken over?” 

c. ‘‘We would like to draw on your experience as a check-pilot to get 
examples of what pilots do on check-rides. Would you think back on the last 
pilot you failed on a check-ride and tell me exactly what he did which caused 
you to fail him?” 


From questions such as these we obtained 333 usable incidents from 
270 interviews. Interviewing was done in 18 cities with pilots from 27 
different scheduled and non-scheduled airline companies. The pilots 
were selected in a fairly random manner. The determining factor for 
selection generally was the presence of the pilots at the airport in prepara- 





The Airline Pilot’s Job 127 


tion for a flight or at the completion of a flight on the particular days 
the interviewers visited the airport. The questions were standard for 
each interview and the interviewers wrote down the responses of the 
pilots on standard forms. An example of the kind of incidents ob- 
tained is the following: 


“On daytime flight from New York to Miami in DC-3, the weather was 
clear with wind gusts up to 50 mph. They were landing at Raleigh with 
. approximately 35 mph wind about 45° across runway. The co-pilot was 
landing the plane from the left seat. He came in too slow and was just about 
to touch the runway going sideways and with the downwind wing dangerously 
low. The captain was afraid the co-pilot might land hard enough going 
sideways to buckle the landing gear or get the wing low enough to cause a 
ground loop. The captain took the controls, added power, corrected for drift 
and landed OK. The captain stated that his co-pilot was inexperienced.” 


The next step in the analysis involved extracting from each incident 
the specific pilot acts contributing to the accident or near-accident. For 
example, in the incident above, the following critical pilot acts were 
extracted: (1) Executed landing approach at too low an airspeed and 
(2) Drifted or was not aligned with runway during round-out. The 
above step yielded 787 specific pilot acts, each of which had contributed 
to an accident or an unsafe situation. This group of acts was subjected 
to a further analysis in which all the acts were sorted into 21 smaller 
groups or clusters of homogeneous acts. These clusters made up or 
defined the critical components of the job of airline pilot. For example, 
it was found that there were four categories of errors all having to do 
with the operatign of controls and switches: 41 instances in which 
pilots forgot to operate a control or switch, 31 of confusing two controls 
or switches, 14 of improperly adjusting a control or moving a switch in 
the wrong direction, and 6 of inadvertent operation of a control or switch. 
These four different kinds of errors of operating controls and switches 
formed a cluster which defined a specific job component. Twenty-one 
such components were extracted from the data. 

As would be expected, it was found that certain components ranked 
higher than others as judged by the frequency of the specific pilot errors 
classified in the particular components. 

The components of the airline pilot’s job found most critical are 
shown in Table 2. In column (a) are listed the 21 critical requirements 
or components of the job which were obtained by classifying or grouping 
similar pilot errors extracted from three sources: (1) from analysis of air- 
line accidents, (2) from analysis of incidents or near-accidents experienced 
by airline pilots, and (3) from analysis of pilot errors reported by check- 
pilots as reasons for failing pilots or for taking over controls from pilots 
on check-rides or flight examinations. The frequencies with ch errors 





128 Thomas Gordon 


Table 2 


Critical Requirements of the Job of Airline Pilot Determined by Frequency of Errors 
Extracted from Accident Reports, Critical Incidents and Flight-Checks 








Frequency of Errors 


(b) (c) (d) (e) 


Critical Requirements Acci- Inci- _Flight- 
(a) dents dents checks Total 








. Establishing and maintaining angle of glide, 
rate of descent, and gliding speed on ap- 
proach to landing 

. Operating controls and switches 

. Navigating and orienting 

. Maintaining safe airspeed and attitude, re- 
covering from stalls and spins 

. Following instrument flight procedures and 
observing instrument flight regulations 

. Carrying out cockpit procedures and routines 

. Establishing and maintaining alignment with 
runway on approach or takeoff climb 

. Attending, remaining alert, maintaining look- 
out 

. Utilizing and applying essential pilot infor- 
mation 

10. Reading, checking and observing instruments, 
dials and gauges 

11. Preparing and planning of flight 

12. Judging type of landing or recovering from 
missed or poor landing 

13. Breaking angle of glide on landing 

14. Obtaining and utilizing instructions and in- 
formation from control personnel 

15. Reacting in an organized manner to unusual 
or emergency situations 

16. Operating plane safely on ground 

17. Flying with precision and accuracy 

18. Operating and attending to radio 

19. Handling of controls smoothly and with co- 
ordination 

20. Preventing plane from undue stress 

21. Taking safety precautions 





were obtained from each of these three sources are shown in columns (b), 
(c), and (d). The total frequencies of errors from all sources are shown 
in column (e). 

Table 3 presents the correlations between the rank order of the critical 
requirements as determined from the frequencies of pilot errors obtained 





The Airline Pilot’s Job 129 


from the three sources. These were computed in order to answer the 
question: ‘‘To what extent do you obtain similar indices of the relative 
‘critical-ness’ of the various job requirements from analyses of critical 
behavior in accidents, in incidents or near-accidents and on flight-checks.”’ 


Table 3 


Correlations Between Rank Order of Critical Requirements as Determined 
from Three Sources of Pilot Behavior 
(Spearman Rho Coefficients) 








Incidents Flight-checks 





Accidents 41 (8.E. = .12)* — .04 (S.E. = .23) 
Incidents .28 (S.E. = .22) 


* An r of .43 is necessary for significance at the 5% level and an r of .55 for signifi- 
cance at the 1% level (4). 





The high positive correlation between the rank order of the critical 
requirements determined from analysis of airline accidents and from 
analysis of the incidents reported by pilots indicates that with the critical 
incident technique we accomplished the objective of obtaining job require- 
ments which are critical from the standpoint of safe flying. The low 
negative correlation between the rank order of the critical requirements 
as determined by analysis of accidents and as determined by analysis of 
behavior of pilots on flight-checks might be interpreted as an indication 
that check-pilots’ reasons for failing pilots on flight-checks are not closely 
related to the requirements of the job which seem to make a difference 
between safe and unsafe airline flying. It may well be that the present 
flight-checks do not provide an adequate evaluation of the extent to 
which pilots demonstrate proficiency in the most critical aspects of airline 
flying. Check-pilots seem to be emphasizing proficiency in different 
aspects of the job, such as flying the plane smoothly and keeping within 
very precise limits of altitude, airspeed and heading. These require- 
ments, of course, are probably important from the standpoint of the com- 
fort of passengers, but at the same time they shouldn’t be emphasized at 
the expense of neglecting requirements which are more critical from the 
standpoint of safety. 


Summary 


The lack of statistically reliable differences between eliminated and 
successful pilots indicates that present methods of selection do not predict 
success or failure in training. To achieve this, it is probably necessary 
for airlines to utilize new procedures which have been validated against 





130 Thomas Gordon 


this or some similar criterion, rather than rely on standardized tests 
and interview procedures which, although of possible usefulness for 
predicting success in other fields, have not been validated as predictors 
of success in airline piloting. Furthermore, it would appear from this 
survey that training and proficiency records of airline companies are 
inadequate for use as criteria of proficiency or for providing a diagnostic 
picture of the proficiencies of their pilots. The findings also suggest 
that the subjective type of flight-check currently employed by airlines 
cannot adequately provide an objective evaluation of the extent to which 
pilots meet the most critical requirements of the job. 

From the results of the analysis of pilot errors extracted from various 
critical incidents it has been shown which components of the pilot’s job 
are most critical from the standpoint of safety and effectiveness on the job. 
The implications of these results are rather obvious, but some may be 
mentioned briefly: 


1. The most critical requirements should receive more emphasis by 
check-pilots who evaluate the proficiency of pilots on flight examinations. 
This study suggests that special emphasis is needed on the landing 
approach, accurate operation of controls, methods of navigating and 
orienting, maintaining safe airspeeds, compensating for drift. These 
data are now being used as a basis for developing a more objective flight 


examination, as mentioned earlier. 

2. The findings suggest a need for improved cockpit design to simplify 
the pilot’s job and reduce the possibility of such errors as: confusing two 
controls, making improper adjustments of controls and inadvertently 
operating controls. 

3. The findings might be used to suggest which components of the job 
need greater emphasis in the training program for pilots. Pilot trainees 
also could be informed which errors are most frequently made in airline 
flying. 

4. The list of critical requirements should prove useful in devising im- 
proved methods of selecting pilots, inasmuch as they provide valuable 
clues as to the critical aptitudes needed by a safe airline pilot. 


Finally, it would seem that the results of this study furnish evidence 
that the “critical incident technique” is a very useful method of isolating 
the critical requirements of a particular job. This method increased 
the size of the sample of critical incidents, which if restricted to accidents 
alone would have been too small to yield a sufficient number of pilot errors 
upon which to base the list of critical requirements. 


Received August 22, 1948. 





The Airline Pilot’s Job 


References 


. Festinger, L., Kogan, L.S., Odbert, H.S., and Wapner,S. An analysis of inspectors’ 
ratings of check-flights as recorded on ACA 342Z. Washington: CAA Division of 
Research, Report No. 58, March 1946. 

. Flanagan, John C. (Ed.). The aviation psychology program in the Army Air Forces. 
Washington: U. S. Government Printing Office, 1948. (AAF Aviation Psy- 
chology Program Research Report No. 1.) 

. Gordon, T. The airline pilot: a survey of the critical requirements of his job and of pilot 
evaluation and selection procedures. Washington: CAA Division of Research, 
Report No. 73, November 1947. 

. Guilford, J. P. Fundamental statistics in psychology and education. New York: 
McGraw-Hill, 1942. 

. Miller, N. E. (Ed.). Psychological research on pilot training. Washington: U. 8. 
Government Printing Office, 1947. (AAF Aviation Psychology Program Re- 
search Report No. 8.) 





Factors Related to Life Insurance Selling * 


D. F. Kahn and J. M. Hadley 
Division of Applied Psychology, Purdue University 


The purpose of the study has been three-fold: first, to determine the 
degree of relationship that exists between relative success in the early 
period of selling life insurance and success at a later period; second, to 
examine various selling activities with a view to uncovering certain 
factors which differentiate successful from unsuccessful agents, and to 
select such factors as might contribute to the refinement of life-insurance 
training programs; third, to investigate further certain personal history 
items and personality traits already known to correlate with success in 
selling life insurance, and to analyze other measurable areas of personality, 
with the aim of increasing the sensitivity of existing selection methods. 
The identification of individuals for whom the likelihood of success is 
known would not only benefit management, but would, to some extent, 
minimize feelings of frustration on the part of the agent who, from the 
outset, may be doomed to failure. 


Procedure 


The subjects considered in the present investigation were a group of 
84 new life insurance agents who had attended Class I and Class II of the 
Purdue Course in Life Insurance Marketing (1). Each subject selected 
for study had received a five-week basic course at the school and had 
also completed thirteen weeks of selling in the field. Production records 
for the most part were collected during the year 1946.1. The salesmen 
represented 19 life insurance companies and 22 states (3). Well over 


* This article is a condensation of the senior author’s dissertation of the same title 
and completed under the direction of the junior author. The dissertation was sub- 
mitted to the faculty of Purdue University in partial fulfilment of the requirements 
or the Degree of Doctor of Philosophy, August 1948 and is on file in the Purdue Uni- 
versity Libraries. 

1The Purdue Course is a one-year plan divided into 15 weeks of classroom training 
and approximately 37 weeks of supervised field work. The 15-week in-residence 
training is divided into three five-week sessions. The first session takes place before 
any actual selling is done; the other two interrupt the 37-week selling period at intervals 
of from about 12 to 16 weeks each. Weekly records are forwarded to the school by 
each salesman’s agency manager during the field period. We are deeply indebted to 
the director of the Purdue Course in Life Insurance Marketing, Mr. D. P. Cahill, for 
giving so freely of advice and assistance and for making available the data upon which 
this study is based. 

132 





Factors Related to Life Insurance Selling 133 


one-half of the group were veterans attending the school under public 
laws affording veterans educational advantages. 

Data were collected in three major areas: selling activities, personal 
history items, and psychological measures. 

Selling activities included: (1) size of application written, (2) number 
of calls made. A call is defined as a face-to-face conversation with a 
potential buyer. (3) Number asked to buy. ‘Asked to buy” is defined 
as a salesman’s discussing life insurance with a client where the latter 
is asked to take action toward securing a policy, and (4) number of 
applications written. An application is defined as a signed application 
for a life insurance policy. 

Relationships between calls, ‘‘asked to buy,’”’ and applications written 
were investigated under the following headings: (1) percentage of applica- 
tions written to total number of calls made, (2) percentage of persons 
asked to buy to total number of calls made, (3) percentage of applica- 
tions written to total number of persons “asked to buy.’’ Because of 
the unequal lengths of reported field work for Class I and Class II, 37 and 
39 weeks respectively, and further because some of the agents, for one 
reason or another, withdrew from the course before completion, the 
measures, calls, ‘asked to buy,” number of applications written, and 
production were computed on the basis of a weekly average. 

During the initial five-week training period a personal history ques- 
tionnaire and a battery of psychological tests were administered. Per- 
sonal history items analyzed were: (1) age, (2) number of dependents, (3) 
living expense per month, and (4) life insurance owned, including National 
Service Life Insurance. 

Psychological measures investigated included: (1) Kuder Preference 
Record (5), (2) Guilford-Martin Personnel Inventory Number I (2), (3) 
the previously mentioned test, used to measure degree of uncertainty as 
determined from the number of questions responded to as undecided, 
**2” (4) The Adaptability Test (8), (5) Part II? of the Aptitude Index (7). 

The measure adopted in the present study as the criterion of success 
was the production records on file with the Purdue insurance school; 
these records were based on the total of signed applications, that is, 
written business, and not the total of business signed, examined and paid 

? Letter grades reported on the Aptitude Index refer to Part II only, and should not 
be confused with the letter grades usually cited for the entire instrument, and which 
are derived from an age-weighted combination of Part I and Part II of the Index. 
A special form, Part III, that is to say, the Personal History portion of the Aptitude 
Index which was devised for use with former service men, was administered but not 
incorporated into this study because of the inaccuracy and the incompleteness of response 
to it. Three of the items appearing on Part I of the Index, number of dependents, 


living expenses per month, and amount of life insurance owned, however, have been 
considered separately in this study. 





134 ' D. F. Kahn and J. M. Hadley 


for, which is generally referred to as paid-for business. Varying lengths 
of school attendance necessitated reducing total production to average 
weekly production in order to evaluate the relative success of each agent. 

In an endeavor to ascertain the relationships between early pro- 
duction and later production, two correlations were computed. The 
first of these considered the relationship between the average weekly 
production for the first 13 weeks and the average weekly production for 
the time spent in the field over and above those 13 weeks. Two agents 
who failed to report to the school after the thirteenth week were elimi- 
nated from this correlation, and thus 82 agents were left who had com- 
pleted from 15 weeks to the maximum school period of 39 weeks in the 
field. Approximately 69 per cent of this group had completed the course. 
The second correlation computed was a measure of the same type of 
relationship; however, this time only those salesmen who had reported a 
minimum of 26 weeks were considered. It was felt that whatever rela- 
tionship would be found to exist in the latter correlation would be a more 
accurate reflection of differences between early and late selling, since the 
first 13 weeks would be compared with a selling period of equal or greater 
length. Sixty-five cases meeting such a requirement were found, approxi- 
mately 87 per cent of whom had completed the course. 

In order to fulfill the second and third purposes of this Study, that is, 
to determine whether or not the measures selected would differentiate 
successful from unsuccessful salesmen, two such contrasting groups of 
agents were identified. In making such a distinction, the agents were, in 
the first place, ranked from high to low on their total sales while attending 
the school. The production records covering the duration of the school 
term for those agents who, for one reason or another, did not complete 
the course, but did continue to sell insurance, were obtained from the 
respective agency managers under whom the subjects were selling. 
These records were used as checks against the agents’ weekly reports of 
production made to the school, and were thus used to substantiate the 
original rank order of the agents in the study. Six of the group of 84 
salesmen withdrew from the school at an early date after the first 13- 
week period, and hence their records were incomplete. These agents had 
either terminated with their companies, or had continued as life insurance 
salesmen, but for various reasons further records on their selling activities 
were not made available. It was felt that the inclusion of these men in 
the analysis would, to some extent, invalidate the rather stable ranking 
of the remaining 78 agents. For these reasons six agents were omitted 
from this part of the study. The ranked average weekly production 
records were divided into three equal groups, numbering 26 agents each. 
The high and the low groups (average weekly production $8,602 and 
$2,181 respectively) were designed as successful and unsuccessful. 





Factors Related to Life Insurance Selling 135 


In an attempt to locate new test items which might be of value in 
future selection devices an item analysis was undertaken on the 150 
questions appearing on the Guilford-Martin Personnel Inventory Num- 
ber I. The D-value method based on Lawshe’s nomograph (6) adapted 
from the Kelley technique (4) was employed in this part of the procedure. 
All items responded to by the agents as uncertain, “?”, were grouped 
with the ‘‘No”’ responses. 


Results 


A correlation of +.61, based on the records of 82 agents, was obtained 
between average weekly production for the first 13 weeks and average 
weekly production for a period of from two to 26 weeks beyond the initial 
period. For the group of 65 agents who had completed at least a second 
13-week selling period, approximately 87 per cent of whom had reported 
records for the entire 37 or 39 week course, a correlation between the 
average weekly production for the first 13 weeks and the average weekly 
production for at least a second 13 weeks was found to be +.55. Both 
of the above-mentioned correlations are significant beyond the one per 
cent level of confidence. 

Analysis of the selling activities measured revealed that several signifi- 
cant differences existed between the groups of successful and unsuccessful 
salesmen. The successful salesmen were higher in every comparison 
except the percentage of prospects asked to buy. They asked more 
people to buy but this can be explained by the fact that they averaged 
more calls per week. Although both groups of agents asked approxi- 
mately 37 persons to buy insurance out of each 100 calls made, the 
successful salesmen sold insurance to approximately 31 per cent of such 
prospects as contrasted with the unsuccessful salesmen who sold to 
approximately 17 per cent. 

Although success in insurance selling is, in part, determined by the 
denomination of the applications written by a salesman, other factors 
are also important. While it is true that the average size of policy 
written by the high and low groups of this study is different, the difference 
between these averages only partially accounts for the difference between 
the two groups with respect to the average weekly production figures. 
Analysis revealed that the successful salesmen were actually able to sell 
to a significantly larger percentage of persons called upon. In view of 
the average number of applications written per week by both groups, the 
successful group of salesmen would have been able to produce over twice 
as much insurance written as the unsuccessful group, even if the size of 
the application written had been exactly the same for both groups. The 
successful group was therefore able, in terms of the number of persons 
to whom they sold alone, to do more selling than the unsuccessful group. 





136 D. F. Kahn and J. M. Hadley 


Table 1 


Comparisons of Mean Differences Between High and Low Producing Groups of Agents 
in Various Selling Activities and Personal History Items 








Mean No.of Mean No.of 
of Cases of Cases S.E. 
High High Low Low Mean of 
Item Group Group Group Group Diff. Diff. C.R. 





Average Weekly Pro- 

duction (in dollars) 8,602 26 2,181 26 6,421 546 11.76 
Size of Application 

(in dollars) 5,958 26 3,538 26 2,419 784 3.08 
Average No. of Appli- 

cations per Week 1.66 74 26 .93 14 6.70 
Average No. of Cases 

per Week 18.68 5 26 d 2.98 
Average No. Asked- 

to-Buy per Week 6.86 © 26 “1, : 1.60 
% of Applications to 

Total No. of Cases 9.66 5 . 26 
% Asked-to-Buy to 

Total No. of Cases 37.50 
% of Applications to 

Total No. Asked-to- 

Buy 30.87 26 
Age* 30.27 26 
Dependents* 1.46 26 
Monthly Living Ex- 

penses (in dollars)* 209 26 
Life Insurance Owned 

(in dollars) * 16,544 25 





* At entry into life insurance business. 


Differences between the two groups in question with respect to per- 
sonal history items revealed, as may be seen in Table 1, that of the four 
items analyzed only one, namely the amount of life insurance owned at 
the time of entry into selling, resulted in a critical ratio significant beyond 
the one per cent level of confidence. Nevertheless, the average agent in 
the successful group was found to be older, to have a greater number of 
dependents, and to have a higher standard of living as determined by 
living expenses per month. It is believed that investigation into per- 
sonal history items that appear among various selection devices would 
reveal that the age factor is closely related to several other items com- 
monly employed in typical questionnaires such as number of dependents, 
amount of insurance owned and so forth. 

Table 2 shows differences between mean scores for the high and the 





Factors Related to Life Insurance Selling 


low producing groups of agents for each of the nine areas dealt with by the 
Kuder Preference test, and gives further the critical ratios for the signifi- 
cance of the differences between the means obtained in these areas. The 
highest critical ratio, 2.81, was obtained in the area entitled “Clerical.” 
The average successful salesmen scored significantly lower in this area 
than did the unsuccessful. The critical ratio for the persuasive com- 
ponent was found to be only 1.92, thus significant at approximately the 
five per cent level of confidence. 

As evidenced by the critical ratios appearing in Table 2 no differences 
significant beyond the ten per cent level of confidence were found to exist 
between the mean scores for the low-producing salesmen for traits 
measured by the Guilford-Martin Personnel Inventory Number I. 

A critical ratio of 1.57, as shown in Table 2, resulted from a testing of 
the significance of the difference between the average number of question- 
mark responses appearing between the high and low-producing groups. 


Table 2 


Comparison of Mean Differences Between High and Low Producing Groups of 
Agents in Various Psychological Measures 








Mean Mean 
Raw No. Raw No. 
Psychological Score of Score of 
Measures of Cases of Cases 
(Kuder Preference High High Low Low Mean 
Record) Group Group Group’ Group Diff. 





Mechanical 63.73 26 57.87 23 5.86 
Computational 29.88 26 32.87 23 —2.99 
Scientific 47.38 45.52 23 1.86 
Persuasive 112.12 104.22 23 7.90 
Artistic 41.46 37.17 23 4.29 
Literary 48.31 53.70 23 —5.39 
Musical 21.96 25.65 23 —3.70 
Social Service 78.04 76.43 23 1.60 
Clerical 47.96 56.09 23 —8.13 
(Guilford-Martin I) 
Objectivity 49.79 54.67 24 —4.88 
Agreeableness 30.63 31.96 —1.33 
Cooperativeness 68.17 68.00 17 
Degree of Uncer- 
tainty* 8.42 12.83 —4.42 
Intelligence 21.32 22.08 2 — .716 AT 
(Adaptability Test) 
Aptitude Index 
Part Two 46.08 26 42.28 25 3.80 2.34 1.62 





* As determined by the ‘?” count on the Guilford-Martin Personnel Inventory 
No. I. 





138 D. F. Kahn and J. M. Hadley 


It was noted that two men in the low-producing group answered at least 
forty out of a total of 150 questions in the Inventory as undecided, which 
is an unusually large number in terms of the general distribution on the 
Inventory. However, what may prove to be an important finding is 
the fact that, when the entire group of agents is considered, the three 
men who received a score of 40 or over in question-mark responses, 
produced an average mean weekly production figure of $2,689 as con- 
trasted with the similar average of $5,281 for the 66 agents who scored 
at 31 and below. There were, moreover, no scores between 30 and 40 
for the whole group of agents in question. Although the number of 
agents in this study who scored unusually high on the measure designated 
as undecided is too small to allow one to place much confidence in it as an 
absolute finding, it is believed that further investigation along these lines 
with larger samples might prove fruitful. 

Even though no significant differences were found to exist between 
the high and the low groups of salesmen when measured by the scores 
derived from the Guilford-Martin Personnel Inventory, an item analysis 
was undertaken in the hope of uncovering certain items that might 
possibly discriminate between the two groups of salesmen. The eight 
items producing the highest D-Values were: 42, 48, 77, 83, 99, 103, 135, 
139. To all of these items, with the exception of item 83, the successful 
group of salesmen responded with a higher percentage of ‘‘Yes’’ answers 
than did the unsuccessful group of salesmen. Only four of these items 
were found to be significant beyond the five per cent level of confidence. 
These were items 42, 77, 135 and 139, having respective critical ratios of 
2.73, 2.14, 2.11, and 2.82, reflecting the significance of the differences be- 
tween the percentages of ‘‘Yes’’ responses to each item for the high and 
low producing groups of salesmen. However, four such items, significant 
at the five per cent level, would be expected to occur in a test of 150 items 
by chance alone. Still it is quite likely that one or more of these items 
might well continue to be discriminating and reliable items. Further 
investigation would probably shed more light on this question. 

No appreciable difference, as is evidenced by Table 2, was found to 
exist between the average mental ability of the successful and unsuccessful 
groups of salesmen. A slight relationship exists between the combination 
of personality characteristics measured by Part II of the Aptitude Index 
and life insurance production. 


Summary 


Based solely on the criterion of written business, and pertaining only 
to those particular life insurance salesmen investigated in this study, the 
following conclusions may be drawn. 





Factors Related to Life Insurance Selling 139 


1. The degree of success during approximately the first three months 
offers a significantly better than chance basis for predicting the degree of 
success in the life insurance selling at a later date. The correlation be- 
tween sales during the first 13 weeks of selling and a second period of 
13 or more weeks is +.55. 

2. Significant differences in favor of the successful agents were found 
to exist between the two criterion groups with respect to the following 
aspects: 


. Average number of calls per week. 
. Number of applications written per 100 persons “‘Asked to buy.” 
3. Number of applications written per 100 persons called upon. 
. Average size of application. 
5. Average number of applications written per week. 


3. Non-significant differences in favor of the successful agents were 
found to exist between the two criterion groups with respect to the number 
of persons ‘‘asked to buy” insurance per week. Since the number of 
persons called upon was significantly higher for the successful groups the 
percentage of persons ‘‘asked to buy” per 100 called upon was almost 


identical for the two groups of salesmen. 
4. Of the four personal history items investigated, only one, namely, 
amount of insurance owned at entry, was found to differentiate signifi- 


cantly beyond the one per cent confidence level between successful and 
unsuccessful life insurance salesmen. The other three items, age at 
entry, number of dependents, and minimum living expenses per month, 
showed positive relationships to the criterion although no significant 
difference between the two groups in question was found to exist for 
these measures. 

5. The findings of the present study indicate that the Kuder Pref- 
erence Record, as commonly used, may identify life insurance salsemen 
but does not differentiate successful from unsuccessful agents. However, 
the analysis of the present data indi¢ates that there are inherent in the 
Record certain relationships with success in selling life insurance that 
may prove to be useful in selecting high producing salesmen. 

6. No significant differences between the two criterion groups were 
obtained for any of the three component measures of The Guilford- 
Martin Personnel Inventory. A supplementary measure, degree of un- 
certainty, as determined from the number of question-mark responses, 
similarly showed no significant difference to exist. One unusual finding, 
however, deserves mention: the three men in groups whose degree of- 
certainty score was abnormally high were identified as producing very 
far below the mean of the total group. While this number is too small to 





140 D. F. Kahn and J. M. Hadley 


permit generalization, it is suggested that such a score may well warrant 
further investigation. 

7. An item analysis of the 150 items of the Guilford-Martin Inventory 
revealed only four items which distinguished between the criterion groups 
significantly beyond the five per cent level of confidence. The result 
reflected by these four items may be considered to be well within chance 
expectation for a test of the present length. Nevertheless, further in- 
vestigation may possibly prove one or more of these items to be service- 
able enough to warrant their inclusion in a selective device. Although 
not a finding of the present study, it is believed possible that existing 
personality tests when carefully analyzed may reveal behavior patterns 
common to successful life insurance agents. It is also believed that 
unstructured or projective tests may prove of value by tapping those 
personality characteristics not capable of being indentified by the usual 
structured test. 

8. No significant difference was found to exist between the mental 
ability test scores of the successful and the unsuccessful salesmen as 
measured by this tool; the mean scores of both criterion groups was for 
all practical purposes the same on The Adaptability Test. 

9. Although no significant difference was obtained between the mean 
raw scores of the two groups in question, trends present in the data 
indicate that Part II of the Aptitude Index may have some predictive 
value. 


Received August 19, 1948. 


References 


. Barnes, D. F. The Purdue course in life insurance marketing. New York: The 
National Association of Life Underwriters, 1946, pp. 24. 

. Guilford, J. P., and Martin, H. G. Guilford-Martin personnel inventory, manual of 
directions and norms. Beverly Hills, Calif.: Sheridan Supply Co., 1943, pp. 2. 

. Kahn, D. F. An analysis of life insurance salesmen. Unpublished master’s thesis, 
Purdue University Libraries, West Lafayette, Indiana, 1946. 

. Kelley, T. L. Selection of upper and lower groups for the validation of test items. 
J. appl. Psychol., 1939, 30, 17-24. 

. Kuder, F.G. Intermediate manual for the Kuder preference record. Chicago: Science 
Research Associates, 1944, pp. 16. 

. Lawshe, C. H., Jr. A nomograph for estimating the validity of test items. J. appl. 
Psychol., 1942, 26, 846-849. 

. Life Insurance Agency Management Association. The value and use of the Aptitude 
Index. Hartford, Conn.: Life Insurance Agency Management Assoc., 1946, 
pp. 24. 

. Tiffin, J.,and Lawshe, C.H. Preliminary manual for the adaptability test. Chicago: 
Science Research Associates, 1943, pp. 9. 





A Window-Stencil Method for Scoring the Strong 
Vocational Interest Blank (Men) 


J. E. Greene, R. T. Osborne, and Wilma B. Sanders 
The University of Georgia 


The Strong Vocational Interest Blank (Men) is generally recognized 
by clinical psychologists and guidance workers as being one of the most 
useful instruments for determining the vocational interests of male 
counselees. The nature of the standardization of the Strong Blank is 
such, in our belief, as to give it a higher degree of specific validity for 
many counselees than that which may be obtained from other tests of 
vocational interest. On the other hand, many circumstances conspire 
to prevent as frequent and effective use of the Strong Blank as its basic 
validity would seem to warrant. In many counseling situations, the 
counselor may wish to secure immediately Strong scores on selected 
occupations for one or a relatively few clients. Under these circum- 
stances local machine scoring of the test is inadvisable. Moreover, if 
the counselor must send the answer sheet to some off-campus test scoring 
service, there will be an unwanted and often crucial delay in obtaining 
the test results. When for either of these reasons machine scoring be- 
comes inadvisable, the counselor must at present resort to the use of 
the intricate, time-consuming and error-ridden process of hand scoring 
the test by means of the Strong ladder stencils, or of choosing some 
quickly-scorable alternate test of vocational interest which often is less 
valid for his particular purpose than the Strong test would be. 

The background for the development of the simplified scheme of 
hand scoring the Strong Biank herein presented may be briefly stated. 
In 1945 the senior author, while serving temporarily as Director of the 
Veterans Guidance Center of the University of Georgia, became im- 
pressed with the local need for a simplified procedure for hand scoring 
the Strong Blank. A large proportion of our case load consisted of male 
veterans interested in some type of college training. For many of these 
clients it was obvious that the Strong Vocational Interest Blank would 
provide more valid and useful measures of vocational interest than 
would any other instrument commercially available. Consequently, 
the senior author set for himself the task of devising an accurate and quick 


141 





142 J. E. Greene, R. T. Osborne, and W. B. Sanders 


procedure for hand scoring the Strong Blank.' The basic procedure 
consisted of the development of four window stencils to which were 
transferred the positive and negative weights assigned to each of the 
400 items of the Blank, for each Strong occupational category separately. 

Since its introduction, this window stencil scoring system has been 
used locally on more than 5000 cases. Our data indicate that a semi- 
skilled psychometrist can score the Strong Blank at the rate of 24% 
minutes per occupation. In our own set-up, as well as in many similar 
counseling situations, the counselor typically will not need to have the 
Strong scored on all possible keys. Our experience indicates that for a 
particular client w2 seldom wish scores on more than six of the occupa- 
tional categories. Consequently, the total amount of scoring time for 
the typical client seldom exceeds fifteen minutes. Where large numbers 
of papers are to be scored on all possible keys, one of the Standard IBM 
methods or the Hankes system is more economical. In addition to 
offering less opportunity for addition errors, and other errors due to faulty 
alignment of the scoring stencils, the procedure herein described has the 
advantage of being more economical of time and money. For example, 
the Strong ladder scoring system presupposes that a Strong booklet (8¢ 
each) will be expended for each client, whereas under our system IBM 
answer sheets (IBM Form ITS 1100 B 360 Rev—@ 2.35¢ each) are used 


and the booklet is not expended. In terms of clerical time involved, our 
window scoring stencil requires only approximately one-fourth as much 
time per occupation as does the Strong ladder scoring system. 


1 Our earliest scheme for using window stencils was devised by the senior author. 
The junior authors have subsequently refined and further simplified our earliest pro- 
cedures. For example, our original procedure required 16 separate window stencils for 
each of the Strong keys, as follows: 

(a) 4 stencils for the positive weights on page 1 of the IBM Answer Sheet, a separate 

stencil for weight +1, +2, +3, and +4. 
(b) 4 stencils for the positive weights on page 2 of the IBM Answer Sheet, a separate 
stencil for weight +1, +2, +3, and +4. o 

(c) 4 stencils for the negative weights on page J of the IBM Answer Sheet, a separate 
stencil for weight —1, —2, —3, and —4. 

(d) 4 stencils for the negative weights on page 2 of the IBM Answer Sheet, a separate 
stencil for weight —1, —2, —3, and —4. 

As contrasted with our earlier procedure which employed the 16 separate window 
stencils indicated above, the present system employs only 4 window stencils. The 
four separate stencils indicated under (a), (b), (c), and (d) above have each been con- 
solidated into a single stencil. As is indicated in Figure 1, all of the positive weights 
for page 1 of the answer sheet are shown on the same window stencil. Weights of +2, 
+3, and +4 are indicated to the right of the respective windows; all the remaining 
windows for this stencil have a weight of +1, but our experience indicates that it is 
preferable not to show the weight of +1 to the right of the window. 





The Strong Vocational Interest Blank (Men) 


Development of the Window Stencil System 


It is proposed to describe our procedure in some detail so that persons 
who wish to do su may prepare their own window stencils for as many 
or as few of the Strong keys as may be desired in a local counseling 
situation. As was implied above, our basic procedure consisted of 
transferring from the Strong ladder stencils for a given occupational 
category (e.g., Chemist) to our own window stencils for that same category 
the several positive (+1, +2, +3, and +4) weights and negative (—1, 
—2, —3, and —4) weights assigned to any given response to each of the 
400 items in the Strong Blank. This process of transferring weights, 
involving the several steps indicated below, will be illustrated with the 
scale for Chemist. 

Step 1. Making use of page 1 (items 1-200) of the IBM Answer Sheet 
for Strong’s Vocational Interest Blank for Men (Revised) Form M? for 
recording the value of the various weights involved, we transferred from 
the Strong ladder stencil for Chemist all the +1, +2, +3, and +4 
weights which Strong assigned to these 200 items on the Chemist scale.’ 
In the same manner, all the positive weights assigned to items 201-400 
were recorded on page 2 of a second Strong Answer Sheet. Thus these 
two separate recordings carried all the positive weights assigned to 
Chemist in the 400 items of the Strong Blank. Similarly, all the —1, 
—2, —3, and —4 weights assigned to items 1-200 were recorded on page 
1 of a third answer sheet and the negative weights assigned to items 
201-400 were recorded on page 2 of a fourth answer sheet. This procedure 
resulted, therefore, in transferring from 9 ladder stencils (each having 3 
separate columns of weights of varying size and sign) to 4 answer sheets 
all the positive and negative weights assigned to Chemist on the 400 items 
comprising the Strong test. 

Step 2. The final step involved in preparing the 4 window stencils to 
replace the 9 ladder stencils for the Chemist scale required little time 
and material. In punching all of our window stencils we used the 
standard heavy cardboard form, International Test Scoring Machine 


2 IBM Form ITS 1100 B 360 Rev. Copyrighted by the Board of Trustees of Leland 
Stanford Junior University. 

3 In practice, this procedure of transferring weights will be facilitated if two persons 
cooperate in the following manner: One person will apply the ladder stencil for Chemist 
to the Strong Interest Blank booklet and read off the +1, +2, +3, and +4 weights 
assigned to each response to items 1-200. For example, “Item 1—no weight; Item 2— 
D, +2; Item 3—I, +1; Item 4—no weight; Item 5—no weight; Item 6—L, +2; Item 
7—D, +1; Item 8—D, +2; Item 9—no weight; Item 10—L, +4; etc.” The second 
person will record these positive weights in the appropriate spaces on page 1 of the answer 
sheet. The appropriate weights for Stencils B, C, and D will be determined similarly. 





144 J. E. Greene, R. T. Osborne, and W. B. Sanders 


Key Form A.‘ Each of the 4 answer sheets described above was used 
as a basis for punching a window stencil to which the weights recorded 
on the answer sheets were accurately assigned. For example, the 
answer sheet which recorded the +1, +2, +3, and +4 values on items 
1-200 was fitted exactly against the back® of one of the cardboard forms 
1000 A 310 and appropriate response positions were punched through 
both the answer sheet and the cardboard form witha pin. Then, working 
from the front (i.e., printed) side of Form 1000 A 310, each circle® through 
which the pin had been punched was converted into a ‘“‘window” with an 
IBM hand punch. The weight of each response to each item was indi- 
cated according to the procedure described in footnote number 1 and 
illustrated in Figure 1. 


























Fic. 1. Man chemist—stencil A: plus weights, items 1-200. 
(For page 1 of Answer Sheet.) 
Note: If sufficient demand should develop for these window stencils arrangements 
will probably be made with Stanford University Press to produce them. 


Thus for each occupational scale, 4 window stencils were prepared, as 
follows: 


*IBM Form ITS 1000 A310. These forms may be procured from the International 
Business Machines Corporation. Cost, 2.3¢ each. 

6 The back rather than the front of the cardboard form was used in order to reduce 
the amount of eye strain in scoring. The circles on the front of the cardboard form 
tend to produce mental confusion and fatigue of the eye muscles. 

6 These circles were used as guides in securing accuracy in punching. 





The Strong Vocational Interest Blank (Men) 145 


Stencil A. Positive weights, page 1 of Answer Sheet (items 1-200); 
Stencil B. Positive weights, page 2 of Answer Sheet (items 201-400); 
Stencil C. Negative weights, page 1 of Answer Sheet (items 1-200); 
Stencil D. Negative weights, page 2 of Answer Sheet (items 201-400). 


Procedures for Window-Stencil Scoring 


Once the window stencils have been prepared, hand scoring of the 
Strong (Men) becomes greatly simplified. Obviously, to evaluate the 
client’s interest in a given Strong occupational category, it is necessary 
to secure the algebraic sum of the positive and negative weights which 
he earned on that scale. For any given scale, the sum of his positive 
weights may be quickly determined by applying window stencil A to 
page / of the answer sheet and window stencil B to page 2 of the answer 
sheet. Similarly, the sum of his negative weights may be obtained by 
appropriate application of window stencils C and D. The algebraic 
total of these two sums constitutes his total raw score on the given 
occupational category. The raw score thus obtained corresponds exactly 
to the raw score obtainable by ladder stencil or machine scoring pro- 
cedures and may be interpreted accordingly. 


Evaluation 


Although a considerable amount of exacting work was involved in our 
preparation of the window stencils herein described, it has been our ex- 
perience that this labor expenditure was of minor significance in com- 
parison to the vast and varied benefits which we have derived from their 
use. In budgetary terms, two types of savings have been notable: (1) 
a marked decrease in clerical time involved in window stencil scoring as 
contrasted with ladder stencil scoring; (2) use of IBM Answer Sheets 
instead of expendable Strong booklets has markedly reduced the per 
capita cost of testing materials and has thus permitted much more ex- 
tensive use of the Strong test than otherwise would have been feasible. 
Finally, our experience indicates that the margin of error in scoring the 
Strong by our window stencil procedure is markedly less than that ob- 
tained when the Strong ladder stencil system is used. 


Received January 25, 1949. 
Early publication. 





A Short Test of Mental Ability 
Jay L. Otis and David J. Chesler 


Personnel Research Institute, Western Reserve University 


A survey of 26 paper-and-pencil tests of mental ability suitable for 
use at the adult level, practically all of which are listed in the Nineteen 
Forty Mental Measurements Yearbook (1), showed that the range of 
“examination time” varied from 12 to 153 minutes. Five of these tests 
required 16 minutes or less. The median examination time was 32 
minutes. It would seem that there are few short tests of mental ability 
suitable for adults—‘‘short” in this connection being defined as approxi- 
mately 15 minutes or less. 

While, in general, no claims of superiority with respect to reliability 
or validity can be made for the short test as compared with a longer 
test, nevertheless the short test has demonstrated its usefulness and 
practicability, and, in many instances, certain advantages over the 
longer test. In the industrial employment office, where time is often at 
a premium, the short test of mental ability can yield results of more 
than acceptable validity with respect to the types of jobs and individuals 
involved. In those situations where the standards are more precise, 
the short test may be used to screen out those individuals who are obvi- 
ously below or above the desired mental standards, so that a longer test 
of mental ability and tests for other functions will be reserved for those 
who fall within the accepted range. This is an economical procedure, 
both to the applicant and to the organization. The applicant who 
cannot possibly qualify is prevented from embarking on a lengthy 
testing program, and the organization is saved the time and expense in- 
volved in administering and scoring a complete test battery. 

In the vocational guidance situation, the short test of mental ability 
has very useful application, also, in that it may be an excellent indicator 
of the type (e.g., “elementary,” “intermediate,” or “advanced’’) of 
longer test that should be administered. It is not an uncommon ex- 
perience with psychometrists and vocational counselors to realize that 
the counselee has taken, or is in the process of taking, a test of mental 
ability which is inappropriate to his level. A short test of mental ability 
used as a ‘“‘pre-test’’ will prevent thisfrom happening. A knowledge of the 
testee’s intelligence, obtained before the test battery is decided upon, is 
also extremely helpful in determining what special tests of aptitude and 


146 





A Short Test of Mental Ability 147 


achievement should be administered. For example, in the case of a 
counselee who wants to go to college, but whose pre-test shows him to be 
significantly below average in intelligence, tests of aptitude should be 
administered which are more applicable to a lower level of employment 
or training, and tests applicable at the college level should be omitted. 
The short pre-test serves other purposes in the counseling situation. 
If administered before the initial interview, it provides clues as to the 
degree to which the counselee will understand the verbal give-and-take 
of the initial interview. It can also be a fast and reliable determiner of 
the necessity for an individual rather than a group test of mental ability. 

For these reasons the Personnel Research Institute of Western Re- 
serve University initiated in 1942 a research project with the purpose of 
developing a short test of mental ability. The Personnel Research 
Institute was in an excellent position to undertake this project since it 
was set up to carry on personnel research in such areas as the development 
of procedures for employment and training of workers, as well as the 
development of techniques in the field of vocational guidance (2). The 
activities of the Personnel Research Institute solved in large part the 
problem of obtaining suitable populations for the standardization and 
validation of a new test. The result of this research is the Classification 
Test for Industrial and Office Personnel (3). 


Description of the Test 


The Classification Test for Industrial.and Office Personnel is primarily 
a measure of mental ability at the adult level, although evidence has 
accumulated that it is also satisfactory for use at the high school level. 
It is a self-administering group test and intended for individuals who 
know how to read. An attempt was made to include items of approxi- 
mately uniform difficulty throughout the test and to keep the difficulty 
level relatively low. Most group tests of intelligence present items in 
order of increasing difficulty. This is often discouraging to the ordinary 
shop or office worker. In addition, an increasing order of difficulty tends 
to reduce the total number of items required. In a short test of mental 
ability constructed on this basis many subjects reach their difficulty 
ceiling in a very short time (perhaps 7 or 8 minutes) so that the effective 
number of items is reduced still further. In the standardization of such 
a test it is usually found that the successful completion of even two or 
three additional items represents a large increase in the standard score or 
percentile rating. In other words, the individual who “gets stuck” on 
an item in a short test seems to be penalized unduly in his final rating. 
For this reason the Classification Test for Industrial and Office Personnel 





148 Jay L. Otis and David J. Chesler 


contains 100 items, which are as many as appear in longer tests. Indi- 
viduals at the college level will often answer correctly as many as 90 items 
and a small number of individuals (about 4 per cent) at this level will just 
about succeed in attempting every item in the maximum time allowed. 

Type and Arrangement of Items. The 100 items are spiralled in series 
of five as follows: vocabulary, general information, arithmetic, general 
information, and analogies. There is thus a total of 40 general informa- 
tion items and 20 each of vocabulary, arithmetic, and analogies. The 
entire test is contained in a four-page booklet with the directions and 
practice problems on the first page and the test items on pages 2, 3, and 4. 
All of the items are of the multiple-choice type, with four alternates. 

Time Limits. The time limit of the Classification Test for Industrial 
and Office Personnel has been kept to a minimum to make it practical 
to use in the employment situation. It is possible to use a time limit 
of either 10 or 15 minutes. The 15-minute time limit is recommended 
since the norms for this time limit are based on an appreciably larger 
number of cases than for the 10-minute period. 

Standardization. Originally the test was administered in tentative 
form to over 3000 subjects. These included generai college students, 
engineering college students, evening college students, high school stu- 
dents, nursing school applicants, clerical workers from typical manu- 
facturing establishments, salesmen, and factory workers. The test went 
through two mimeographed versions and one printed version on an experi- 
mental basis before it was published in its final form. 

Reliability and Equivalence of Forms. The odd-even reliability of the 
test, as corrected by the Spearman-Brown formula, is .94. Two forms 
of the test, A and B, are available. A correlation of .86 between the two 
forms was obtained when they were administered in A-B order to a 
group consisting of 90 academic high school students and 159 college 
students. A correlation of .85 was obtained for a group of 72 commercial 
high school students. Correlations of .80 and .82 were obtained for 
similar groups who took the tests in B-A order. 

The differences in difficulty between the two forms are practically 
negligible and appear to approach the minimum that can be expected. 
For a group of 389 academic high school and college students, the dif- 
ference was 1.34 raw score points. For a group of 67 commercial high 
school students the difference in difficulty was —.178 raw score points. 
The practice effect for these two groups was 4.76 and 2.42 respectively. 

Validity. Validity coefficients obtained thus far are of two types: 
(1) correlations with other tests of intelligence, and (2) correlations with 
job or school performance. These validity coefficients are presented in 
Table 1. Since this test is short and does not cover the entire range of 





A Short Test of Mental Ability 149 


mental ability, correlations between it and longer tests of intelligence are 
not as high as are usually obtained between longer tests of intelligence. 
As can be seen from Table 1, the test has demonstrated low but positive 
validity in the industrial situation and somewhat better validity in the 
commercial school situation. It would appear, however, that because 
of its short time limit, the test is appropriate as part of a battery designed 
for industrial or school use. It is of interest to note that a critical norm 
of 40 was established in two validity studies. In the first of these the 
test was used as part of a battery to select salesmen. It was found that 
men scoring below 40 were difficult to train and inferior in sales perfor- 
mance. In the second study it was found that men scoring below 40 
were poor risks for the job of bus or street car operator. 

Norms. The following norms are available: Adult (N = 1662); 
general college (N = 946); engineering college (N = 113); evening college 
(N = 293); high school (N = 383); nursing school applicants (N = 254); 
clerical workers (N = 137); sales personnel (N = 225); factory workers 
(N = 1494); general population (N = 6007). 


Table 1 


Validity Coefficients for the Classification Test for Industrial and Office Personnel 








Group N r Criterion 





Other Tests 


College 

High School 
Clerical Employees 
General Adult 
General Adult 
Nursing Applicants 
Junior Clerks 


High School 
High School 
High School 
High School 
High School 
High School 
High School 


Maintenance Salesmen 
Maintenance Salesmen 
Heater Salesmen 
Junior Clerks 

Junior Clerks 


A.C.E., 1942 Edition 

Otis S-A, Higher Forms B and D 
Otis S-A, Higher Forms A, B, and D 
Otis S-A, Higher Form D 

A.C.E., 1941 Edition 

California Mental Maturity, Form A 
Otis S-A, Higher Form D 


School Course Grades 


Business Information and Mathematics 
Typing 

Bookkeeping 

Stenography 

Office Production 

Filing 

Machine Calculation 


Job Performance 


Ratings of sales performance 
Total sales for two years 
Sales ability (biserial r) 

Job rating 

Progress rating 








Jay L. Otis and David J. Chesler 


Summary 


A short test of mental ability has been described which, it is felt, is 
very appropriate for use in the industrial and vocational guidance situa- 
tions. This test is the Classification Test for Industrial and Office Per- 
sonnel, Forms A and B. 

The distinguishing characteristics of this test are: (1) a short time 
limit; (2) a large number of items of approximately uniform difficulty, 
rather than a small number of items presented in order of increasing 
difficulty. It is believed that this sort of mental ability test is more suit- 
able to the typical office or factory employment situation than the usual 
type of intelligence test. 

At the present writing the test has been standardized on over 6000 
subjects. The odd-even reliability is .94, and the correlation between 
alternate forms varies from .80 to .86. Differences in difficulty between 
the two forms are practically negligible. Norms are available for nine 
different industrial and school populations. Validities with other, longer, 
tests of mental ability range from .62 to .83. Validities with grades in 
commercial high school courses range from .27 to .56. Validities with 
various criteria of job performance range from .21 to .49. 


Received October 1, 1948. 
References 


1. Buros, O. K., Ed. The nineteen forty mental measurements yearbook. Arlington, 
Va.: Gryphon Press, 1945. 

2. Otis, J. L. The Personnel Research Institute of Western Reserve University. J. 
consult. Psychol., 1946, 10, 131-135. 

3. Otis, J. L., et al. Classification test for industrial and office personnel (Forms A 
and B). Cleveland, Ohio: Western Reserve University Press, 1947. 





Abbreviated Job Evaluation Scales Developed on the Basis 
of “Internal” and “External” Criteria 


David J. Chesler 


Personnel Research Institute, Western Reserve University 


In recent years much of the published material in the field of job 
evaluation which might properly be designated as “research’”’ has been 
concerned with abbreviated job evaluation scales. Most of this work has 
been performed by Lawshe and various associates (4, 5, 6, 7). The 
writer (3) has also presented some findings on this topic. All of these 
studies utilized the Wherry-Doolittle selection method (8) to derive the 
abbreviated scales. The procedure has been to apply the Wherry- 
Doolittle process to the factors or “rating scale items” which comprise a 
job evaluation scale, and to identify the first three or four factors in the 
scale which contribute most to the ratings which jobs receive on the scale. 
The ratings predicted from these three or four factors are then compared 
with the ratings received on all of the original factors. The criterion 
is the original job evaluation scale from which the abbreviated scale was 
derived. 

The present study has attempted to answer the question as to which 
three or four factors in a job evaluation scale would be identified if 
another job evaluation scale were used as the criterion. Such a criterion 
has been designated throughout this report as an ‘‘external ’’criterion, 
in contrast to the rating on the original manual, which may be designated 
as the “internal” criterion. Will similar abbreviated scales emerge when 
various job evaluation manuals constitute the external criteria? It is 
believed that a study of this sort offers a method of analyzing the differ- 
ences between two job evaluation manuals. Specifically, it answers the 
question of what factors in one job evaluation system constitute the best 
measure of another system. 


Method 


Job raters in three industrial organizations rated independently de- 
scriptions and specifications for 35 “standard”’ salaried jobs on a “‘stand- 
ard” job evaluation manual and on their own respective company 
manuals. The jobs, the standard manual, the company manuals, and the 
job analysts involved are the same as those reported in previous studies 
(2, 3). 


151 





David J. Chesler 


Results and interpretation 


Standard Manual Factors Identified with Internal Criterion. As re- 
ported previously (3), the Wherry-Doolittle selection method was applied 
to the standard manual factor ratings submitted by the raters in the three 
companies, with total rating on the standard manual as the (internal) 
er:terion. With the internal criterion the first four factors identified 
with each of the three groups of raters were the same, although the order 
of identification was not the same.! These four factors were: ‘‘Work 
experience’; “character of supervision received’’; “character of super- 
vision given’’; and “responsibility for confidential matters.”’ 


Table 1 


Abbreviated Scales Derived from Standard Manual with External Criteria in Three 
Companies by Raters Who Rated the Standard Jobs on the Standard 
Manual and on Their Respective Company Manuals 








Co. A Co. B Co. C 











Factor No. R Factor No. R Factor No. R 





5 839 6 
2 921 9 
6 941 10 
10 956 8 
11 


905 
-959 
.969 
.974 
977 
978 
.979 
.979 


978 


—_ _ 
OlwNnonworenv fc 





Key to Factor Numbers: 2. Essential knowledge and training; 4. Character of 
supervision received; 5. Character of supervision given; 6. Number supervised; 7. Re- 
sponsibility for funds, securities, and other valuables; 8. Responsibility for confidential 
matters; 9. Responsibility for getting along with others; 10. Responsibility for accu- 
racy—effect of errors; 11. Pressure of work; and 12. Unusual working conditions. 


Standard Manual Factors Identified with External Criterion. The 
procedure followed in the present study was to apply the Wherry- 
Doolittle selection process to the factors of the standard manual, with 
total ratings on a company manual as the (external) criterion. The 
results are summarized in Table 1. 

Since comparisons of abbreviated scales have previously (3) been 
made on the basis of the first four factors identified, we need concern 
outselves in Table 1 only with the first four factors identified in each 


1 For a more complete discussion of these findings, see a previous study (3). 





Abbreviated Job Evaluation Scales 153 


instance. The striking feature of the abbreviated scales that emerge 
with different external criteria is their dissimilarity—as contrasted with 
the striking similarity of the abbreviated scales that emerged with the 
same internal criterion (3). The number of times certain factors were 
identified among the first four factors for the three groups of raters may 
be summarized as follows: 


Factor No. Times 
Number Factor Identified 
Essential knowledge and training 
Character of supervision received 
Character of supervision given 
Number supervised 
Responsibility for confidential matters 
Responsibility for getting along with others 
Responsibility for accuracy-effect of errors 
Pressure of work 


=SOV ANP 


_ 
> | 
no —- woe = tw 


Total 


Out of a possible total of twelve factors, eight appeared either once 
or twice. It is interesting that no single factor emerged three times, that 
is, once in each of the abbreviated scales derived with an external criterion. 
Of the eight factors, three (“‘character of supervision received,” ‘‘character 
of supervision given,” and “responsibility for confidential matters’’) 
were also identified in abbreviated scales derived with the internal cri- 
terion (3). It would appear that these three factors are important, not 
only in the standard manual, but also in some form or other in the 
manuals used in the three companies. The fact that two “supervisory” 
items were identified, not only with the internal criterion, but also with 
three different external criteria would indicate that in the standard and 
the company manuals factors concerned with supervision are very im- 
portant. 

It would seem that the results obtained with external criteria indicate 
primarily essential differences among the company manuals, as analyzed 
in terms of the standard manual factors. This may be contrasted with 
the results obtained with the same internal criterion—which indicate 
primarily differences among the raters (3). 

Adequacy of Abbreviated Scales Derived from Standard Manual with 
External Criterion in Predicting External Criterion. As in the case of 
abbreviated scales derived with internal criteria (3), an analysis was made 
of the accuracy with which the abbreviated scales, derived from the 
standard manual with external criteria, predict the external criteria, 
that is, ratings on the company manuals. 





154 David J. Chesler 


The multiple regression equations for predicting total points on the 
company manuals from point ratings on the selected standard manual 
factors were computed and applied to the ratings given on the selected 
standard manual factors by the raters in the three companies. Three 
sets of predicted company manual ratings were thus obtained. 

The actual classification plans of the three companies (see Table 2)? 
were used to study the comparative adequacies of the three abbreviated 
scales. The labor grades within each of the company plans are unequal 
and follow roughly a geometric rather than an arithmetic progression. 

Table 3 shows the per cent of jobs in each instance which remained 
in the same labor grade, or which were displaced into another labor grade. 
In companies A, B, and C, respectively 88.5 per cent, 91.4 per cent, and 
97.1 per cent of the jobs remained in the same labor grade or were dis- 
placed into a labor grade adjacent to that of the original classification. 
In all three companies some jobs were displaced by two or three labor 
grades. 

Table 4 shows how ratings on the abbreviated scales derived with 
external criteria deviated by the point value of 0.5 labor grade, 1.0 labor 
grade, or more than 1.0 labor grade from total ratings on the original 
(company) manuals. In the three companies respectively 31.4 per cent, 
54.3 per cent, and 68.6 per cent of the predicted ratings deviated from the 
original ratings by the point value of 0.5 labor grade or less. Similarly 
65.6 per cent, 85.7 per cent, and 88.6 per cent of the predicted ratings 
deviated from the original ratings by the point value of 1.0 labor grade 
or less. In other words, for three companies respectively 34.4 per cent, 
14.3 per cent, and 11.4 per cent of the predicted ratings deviated from 
the original ratings by a point value greater than one labor grade. 

Adequacy of Abbreviated Scales Derived from Standard Manual with 
Internal Criterion in Predicting External Criterion. The first four factors 
of the standard manual consistently identified by the Wherry-Doolittle 
selection process with total rating on the standard manual as the (internal) 
criterion were factors 1, 4, 5, and 8, that is ‘“‘work experience,” “‘character 
of supervision received,’’ “character of supervision given,” and ‘‘re- 
sponsibility for confidential matters’ (3). These factors might be de- 
scribed as the primary factors of the standard manual because, when 
weighted properly, they are the “best measure”’ of total ratings on the 
standard manual. It is of interest to know how well this best measure 
of a manual measures total ratings on other manuals. 

* Tables 2 to 6 inclusive have been deposited with the American Documentation 
Institute. Order Document 2558 from American Documentation Institute, 1719 N St., 
N.W., Washington 6, D. C., remitting $0.50 for microfilm (images 1 inch high on stand- 
ard 35 mm. motion picture film) or $0.70 for photocopies (6 by 8 inches) readable 
without optical aid. 





Abbreviated Job Evaluation Scales 155 


The specific question to be answered here is how well do the abbrevi- 
ated scales derived from the standard manual with total ratings on the 
standard manual as the (internal) criterion predict the exte 1Aal criterion, 
that is, company manual ratings. 

The multiple regression equations for predicting company manual 
ratings from factors 1, 4, 5, and 8 of the standard manual were computed 
and applied to the ratings given on these factors by the raters in the 
three companies. Three sets of predicted company manual ratings were 
thus obtained. 

Again, the actual classification plans of the three companies (see 
Table 2) were used to study the comparative adequacies of the abbrevi- 
ated scales. Table 5 shows the per cent of jobs in each instance which 
remained in the same labor grade or which were displaced into labor 
grades one, two, or more grades removed from that of the original classifi- 
cation. In companies A, B, and C, respectively 68.5 per cent, 94.2 
per cent, and 91.4 per cent of the jobs remained in the same labor grade 
or were displaced into a labor grade adjacent to that of the origine' 
classification. 

Table 6 shows how predicted company manual ratings based on the 
abbreviated scales derived with internal criteria deviated by the point 
value of 0.5, 1.0, or more than 1.0 labor grade from total ratings on the 
original (company manual) scales. In the three instances 11.5 per cent, 
40.0 per cent, and 60.0 per cent of the predicted ratings deviated from the 
original ratings by the point value of 0.5 labor grade or less; and 42.8 
per cent, 71.4 per cent, and 91.4 per cent of the predicted ratings deviated 
from the original ratings by the point value of 1.0 labor grade or less. 

Comparison of All Abbreviated Scales Derived from Standard Manual. 
In both the present and in a previous study (3) abbreviated scales have 
been derived from a standard manual, with an internal criterion (standard 
manual total rating) and with an external criterion (company manual 
total rating). Table 7 summarizes the data required to form an opinion 
as to the relative adequacies of these abbreviated scales in predicting 
either the internal or external criterion. 

In terms of the multiple coefficient of correlation and the index of fore- 
casting efficiency the adequacy of prediction is clearly in the hierarchy: 


1. Abbreviated scales derived with internal criterion and used to 
predict the internal criterion. 

2. Abbreviated scales derived with external criterion and used to 
predict the external criterion. 

3. Abbreviated scales derived with internal criterion and used to 
predict the external criterion. 





156 David J. Chesler 


Table 7* 


Adequacy of Abbreviated Scales Derived from Standard Manual with Internal 
Criterion (Standard Manual Ratings) and External Criterion 
(Company Manual Ratings) 








Derived with Derived with Derived with 
internal external internal 
criterion; criterion; criterion; 
used to used to used to 

predict predict predict 
internal external external 
criterion criterion criterion 





Multiple coefficient of corre- f .98 .96 91 
lation (R) .98 .96 88 
.99 98 97 


Index of forecasting efficiency f .79 72 59 
(E) 81 Pr p’ 52 
85 78 75 


% jobs remaining in same, or ; 88.5 68.5 
displaced into adjacent, labor , 91.4 94.2 
grade L 97.1 91.4 


% predicted ratings deviating ! 65.6 42.8 
value of 1.0 labor grade or less 85.7 71.4 
from original ratings C 88.6 91.4 





* See footnote 2. 


This hierarchy is apparent from the fact that all R’s and E’s decrease 
as one reads across each row. 

In terms of the percentage of jobs remaining in the same or in being 
displaced into an adjacent labor grade, this hierarchy holds for companies 
A and C, but not for Co. B. However, in the case of Co. B the discrep- 
ancy is due to a difference of only one job. 

In terms of the percentage of predicted ratings deviating by the value 
of one labor grade or less from the original ratings, the hierarchy holds for 
companies A and B, but not for Co. C. However, here again the dis- 
crepancy is due to a difference of only one job. 


Summary 


1. The basic methodological feature of the present study was to have 
raters in three companies evaluate a standard set of descriptions and 
specifications for 35 representative salaried jobs on a standard job evalua- 
tion manual and on their own respective company manuals. 

2. The Wherry-Doolittle selection method was applied to the standard 
manual factor ratings submitted by the raters in each company, with 





Abbreviated Job Evaluation Scales 157 


total rating on the standard manual as the (internal) criterion. The 
first four factors identified were the same for each group of raters, al- 
though the order of identification was not the same (3). These results 
indicate primarily differences among raters. 

3. The Wherry-Doolittle selection method was again applied to the 
standard manual factor ratings submitted by the raters in each company, 
but with total ratings on the respective company manuals as the (ex- 
ternal) criterion. Out of a possible total of twelve factors, eight were 
identified among the first four for all three groups of raters. The striking 
feature of the abbreviated scales derived with external criteria is their 
dissimilarity—as contrasted with the striking similarity of the abbrevi- 
ated scales that emerged with the same internal criterion. These results 
indicate primarily differences among the company manuals, as analyzed 
in terms of the standard manual factors. 

4. An analysis of the adequacy of the abbreviated scales derived from 
the standard manual with internal and external criteria in predicting the 
internal and external criteria indicates in general the following hierarchy 
of accuracy of prediction with abbreviated scales: 

a. Derived with internal criterion and used to"predict”the‘internal 
criterion. 

b. Derived with external criterion*and used to predict the external 
criterion. 

c. Derived with internal criterion and used to predict the external 
criterion. 


Received August 25, 1948. 
References 


. Benge, E. J., Burk, 8. L. H., and Hay, E. N. Manual of job evaluation. New York: 
Harper & Brothers, 1941. 
. Chesler, D. J. Reliability and comparability of different job evaluation systems, 
J. appl. Psychol., 1948, 32, 465-475. 
3. ——. Reliability of abbreviated job evaluation scales. J. appl. Psychol., 1948, 32. 
622-628. 
. Lawshe, C. H., Jr. Studies in job evaluation: II. The adequacy of abbreviated 
point ratings for hourly-paid jobs in three industrial plants. J. appl. Psychol., 
1945, 29, 177-184. 
5. ——, and Alessi, 8S. L. Studies in job evaluation: IV. Analysis of another point 
rating scale for hourly-paid jobs and the adequacy of an abbreviated scale- 
J. appl. Psychol., 1946, 30, 310-319. 
. —, and Maleski, A. A. Studies in job evaluation. 3. An analysis of point ratings 
for salary paid jobs in an industrial plant. J. appl. Psychol., 1946, 30, 117-128. 
. —, and Wilson, R. F. Studies in job evaluation. 5. An analysis of the factor 
comparison system as it functions in a paper mill. J. appl. Psychol., 1946, 30, 
426-434. 
. Stead, W. H., Shartle, C. L., and Associates. Occupational counseling techniques. 
New York: American Book Co., 1940. 





Studies in Job Evaluation: 8. The Reliability of an 
Abbreviated Job Evaluation System 


C. H. Lawshe and Patrick C. Farbro 


Occupational Research Center, Purdue University 


Several systems for evaluating jobs have been developed. Of these 
much has been written and considerable experimentation has been carried 
on because the setting of wage rates is one of the most important mana- 
gerial functions. The great majority of these systems arrive at their 
goal—the systematic pricing of jobs—by breaking the jobs into their 
various elements or components. The number of elements on which 
jobs have been rated varies from system to system. Using a scaling 
method of some sort, the rater assigns degrees of each component to each 
job, the various degrees are weighted and total point values are con- 
verted into wage rates. 

Previous Studies. As a result of a series of studies by the senior 
author and others (1, 2, 3, 4, 5, 6), an abbreviated system of job evaluation 
has been developed and reported. When forty job descriptions were 
submitted to two groups of independent raters, one of which applied the 
NEMA system and the other used this system, a correlation of .90 
between the two was obtained (7). Lawshe and Wilson (6) have shown 
in a previous study that the abbreviated system of four items is more 
reliable (.98 for five raters) than the NEMA system (.94 for five raters). 
However, since their data were gathered by sending job description by 
mail to the cooperating analysts, the question of functional reliability 
in the practical situation remains unanswered. 

Purpose of this Study. The primary purpose of this study was to 
determine the reliability or consistency with which raters, all from the 
same plant, evaluate jobs in that plant by means of this simplified system. 

More specifically, the purposes of this study are: (1) to compare 
reliability coefficients obtained in Lawshe and Wilson’s study of hypo- 
thetical jobs with reliability coefficients in an operating plant; (2) to 
compare independent ratings made by the evaluation committee with 
ratings adjusted through conference discussion; and (3) to examine 
rating differences between labor committee members and management 
committee members. 

158 





Studies in Job Evaluation: 8 


Procedure 


The Abbreviated Job Evaluation System. The system of job evaluation 
used in this study is that developed by Lawshe. The system provides 
for the rating of jobs on four scales: ‘General Schooling,” ‘Learning 
Period,”’ ““Working Conditions,” and “Job Hazards.” 

The Job Evaluation Committee. The committee used in evaluating 
the forty-three jobs in this study consisted of five members. Two of 
the members were employees belonging to the union. Management 
representatives on the job evaluation committee included the production 
manager and the secretary of the company. The fifth member of the 
committee was the production superintendent during the time production 
jobs were being evaluated and the maintenance superintendent while 
maintenance jobs were being evaluated. 

Rating the Jobs. The actual procedure of rating the jobs consisted of 
several phases which were preceded by standard job description prepara- 
tion. After a general orientation, each committee member was furnsihed 
a set of forty-three 3” by 5”’ white cards on which had been typewritten 
each of the forty-three job titles. The committee was then instructed to 
consider only the ‘‘General Schooling” required for performing each job 
and on that basis to place the cards in rank order from the job requiring 
the greatest amount of schooling to the job requiring the least amount 
of schooling for successful performance on the job. On completion of 
this task, a set of six colored cards representing each of the six degrees of 
the “‘General Schooling” scale was given each committee member. Mem- 
bers were then instructed to insert the colored cards in their stack of white 
cards at the places most logical for the breaks. Thus the degrees of the 
“General Schooling” scale were assigned. In similar manner, “Learning 
Period,” “Working Conditions,” and “Job Hazards” scales were em- 
ployed in rating each job. 

From these cards a summary page showing the degrees assigned each 
job by each committee member was prepared and anchor jobs, those on 
which at least four of the members initially agreed, were identified. The 
committee was then again assembled and by using the anchor jobs 
as reference points, members discussed and adjusted the ratings for those 
jobs on which there was disagreement. It is important, however, that 
the initial ratings were made without committee discussion. 


Results 


The Obtained “‘one against one’’ Reliability Coefficients. Shown in 
Table 1 in the second column are the obtained reliability coefficients for 
each of the items of the abbreviated system and for total points as jobs 





160 C. H. Lawshe and Patrick C. Farbro 


were evaluated in this study. The figures shown in this column are the 
averages of the coefficients obtained by correlating initial ratings of each 
rater with initial ratings of every other rater.!. The figures shown in 
column two are the most likely correlations between the ratings of one 
rater and the ratings of one other rater. For convenience and for com- 
parison with the previous study by Lawshe and Wilson (6), these have 
been called the ‘one against one”’ reliabilities. 


Table 1 


Reliability Coefficients for Total Point Ratings and for the Component Scale 
Ratings in the Lawshe-Wilson Study and in This Study 








“One against one” “Five against five” 
Reliability Reliability 








Lawshe- This Lawshe- This 
Item Wilson Study Wilson Study 





Total Points 89 91 .98 .98 
Learning Period 86 84 .97 .96 
General Schooling 79 .84 95 .96 
Working Conditions 61 13 89 .93 
Job Hazards 51 54 84 86 





The “‘five against five’ Reliability Coefficients. Even though the re- 
liability coefficients shown in column two are those actually obtained, 
they are inadequate estimates of the true reliability of pooled ratings of 
members of the committee. As was mentioned before, the “one against 
one’’ reliabilities are the best estimate of reliability of the ratings of one 
rater as compared with those of one other rater. Since five raters were 
involved in the rating of each job, the reliabilities in the second column 
were ‘‘stepped up” by use of the Spearman-Brown formula to estimate 
the reliabilities of the pooled ratings of all five of the job evaluation com- 
mittee members. These “stepped up” ratings are shown in Table 1 in 
the fourth column. 

It is not advocated that these coefficients of reliability be accepted as 
absolute, but merely that they are estimates of the ture reliabilities of 
the abbreviated job evaluation system. The results presented should 
be qualified in view of one’s own evaluation of the assumptions involved 
in such a procedure. 

It is evident from column four that reliability coefficients of the mag- 
nitude found are definitely high enough for purposes for which the system 
was designed. As will be noted, “‘five against five” reliability coefficients 

‘Correlations were found between the following patterns of pairs of raters: A-B, 
A-C, A-D, A-E, A-F, B-C, B-D, B-E, B-F, C-E, C-F, D-E, D-F, E-F. Obtained 
reliability coefficients were then averaged by transformation to Fisher Z-values (9). 





Studies in Job Evaluation: 8 161 


for the four scales range from .86 (Job Hazards) to .96 (Learning Period), 
with all but one scale, “Job Hazards,’ above .90. Agreement among 
raters as evidenced by a reliability coefficient of .98 for total points 
definitely indicates high enough reliability for most practical purposes. 

Comparison of Lawshe-Wilson Study and This Study. The first item 
of interest in comparing the data from the two studies in Table 1 is the 
close agreement of the reliabilities found for total point ratings (.89 and 
.91 for “‘one against one’’ reliabilities and .98 and .98 for “‘five against 
five’’ reliabilities). 

The single items found most reliable in this study are those of the 
“Skill Demands” factor (Learning Period and General Schooling) and 
are the same as those found most reliable in the previous study. 

The next most reliable items in this study (Working Conditions) has 
the same rank position in the Lawshe-Wilson study. The least reliable 
scale of the abbreviated system (Job Hazards) was found in the same 
rank-position in both studies. 

It is interesting to note that the rank order of magnitude of the relia- 
bility coefficients is essentially the same in both studies. It appears that 
the estimation of the reliabilities in the Lawshe-Wilson study were a con- 
servative estimate of the reliability of the abbreviated system when 
employed in an actual industrial situation. This is easily understood 
since in the Lawshe-Wilson study the several raters were from different 
plants in scattered geographical locations and used only job titles and 
descriptions in evaluating the jobs, while in this study five raters were 
evaluating definite jobs in a plant with which each was familiar. 

Comparison of Management and Labor Ratings. In comparing the 
reliability of labor and management committee members, the first item of 
interest is the consistency of the findings as shown in Table 2. The 
correlation coefficients representing reliability or agreement between the 
labor committee members are consistently lower than those representing 
agreement between two management members. The “one against one”’ 
reliability coefficients for labor union committee members range from. 37 
(Working Conditions) to .83 (Total Points), while for management 
members they range from .80 (Job Hazards) to .94 (Total Points). 

Considering agreement between two labor union committee members 
and two management representatives? ‘‘one against one” reliability 
coefficients range from .66 (Job Hazards) to .86 (Total Points). These 
reliability coefficients, it will be noted from Table 2, fall between those of 
the labor members which are lowest and those of management repre- 
sentatives which are highest. 


2 Mean of obtained correlations between each management member and each labor 
member was derived by transformation to Fisher Z-values. 





162 C. H. Lawshe and Patrick C. Farbro 


Table 2 


Coefficients of Reliability for Two Labor and Two Management Job 
Evaluation Committee Members 








Labor- Mgmt- Labor- 





Item Labor Mgmt Mgmt 
Total Points 83 94 86 
Learning Period .73 86 .80 
General Schooling 71 .92 Bo 
Working Conditions od 90 71 
Job Hazards .68 .80 66 





Comparison of Initial Ratings with Adjusted Ratings. As was pre- 
viously mentioned two sets of ratings were available for each job title— 
initial ratings, independently assigned by the raters for each of the 
various scales for each job title, and adjusted ratings, those resulting from 
conference discussion. The mode of the adjusted ratings was the point 
value actually used as the basis for the wage structure in the plant. The 
mean or average points assigned by each were used in the Lawshe-Wilson 
study since it was impossible to assemble the various raters in a conference 
for the purpose of adjusting ratings. For this reason it was considered 
advisable to obtain a measure of relationship between the mean initial 
ratings and the mode of adjusted ratings. In Table 3 the correlation is 
shown to be .97 for Total Points, and to range from .83 (Job Hazards) 
to .94 (General Schooling) for the component scales of the abbreviated 
system. These values are probably large enough to support the hy- 
pothesis that conclusions based upon mean independent ratings are valid 
for plant situations in which majority decisions are reached. 

Similarly, it seemed desirable to investigate separately the relation- 
ship between initial ratings as made by management, labor, and main- 
tenance and production superintendents and the mode of adjusted 


Table 3 


Coefficients of Correlation Between Mean of Initial Ratings and 
Mode of Adjusted Ratings 











Item r 
Total Points 97 
Learning Period .92 
General Schooling .94 
Working Conditions 86 


Job Hazards 83 














Studies in Job Evaluation: 8 163 





ratings. Table 4 shows these relationships. Coefficients of correlation 
between the mean of management representatives’ initial ratings and the 
mode of adjusted ratings were found to be consistently larger (ranging 
from .73 on “Job Hazards’”’ scale to .97 for Total Points) than those of 
labor union members (.65 to .93 on the same scales). The rank order of 
magnitude of the coefficients is the same for both management and labor 
union committee members. 


Table 4 


Obtained Coefficients of Correlation Between Initial Ratings for Management, Labor, 
and Superintendents, and Mode of Adjusted Ratings 














Superintendents 
Item Mgmt Maint. Prod. Labor 
Total Points 97 97 97 93 
Learning Period 95 .96 .96 .92 
General Schooling .93 93 88 86 
Working Conditions 86 85 .90 .69 
Job Hazards 73 77 89 65 





Also shown in Table 4 are the correlations between the maintenance 
and production superintendents’ initial ratings and the mode of adjusted 
ratings. The magnitude of these coefficients (ranging from .77 on “Job 
Hazards” scale to .97 for Total Points) is higher than those of the labor 
union committee members. They are also larger than those of the 
management committee members on all but the “General Schooling” 
scale. 

In considering the change from initial ratings to adjusted ratings, it 
was also deemed advisable to examine actual point value changes. This 
was accomplished by tabulating each rater’s actual point change from 
his initial ratings to his adjusted ratings. The point value for each item 
and total points assigned by each rater for each job was considered in 
this analysis. For example, on job number 1, Rater A initially rated 
the job as being worth 130 points on the “Learning Period” scale. During 
the conference discussions his rating was changed to 150 points; thus a 
+20 was tabulated. This procedure was followed throughout. 

Shown in Table 5 is the mean gross change per job by raters from 
initial point ratings to adjusted point ratings. These point changes were 
derived as above by adding point value changes disregarding algebraic 
signs. From Table 5a general trend may be seen. Raters ‘“‘E”’ and “F’’, 
both labor members, have the greatest average gross change from initial 
to adjusted ratings while Rater ‘‘C’’, the maintenance superintendent, 

















wie 


PRT 








164 C. H. Lawshe and Patrick C. Farbro 


Table 5 


Mean Gross Change per Job of Raters from Initial Point Ratings 
to Adjusted Point Ratings 








Total Learning General Working Job 





Rater N Points Period Schooling Conditions Hazards 
A (Mgmt) 43 10.4 6.9 3.1 1.3 Ri 
B (Mgmt) 43 10.1 4.4 5.0 9 9 
C (Sup-Maint) 16 1.8 1.2 0.0 A 2 
D (Sup-Prod) 27 10.2 3.8 5.2 2 1.2 
E (Labor) 43 18.6 8.7 12.4 1.1 9 
F (Labor) 43 18.3 12.0 5.7 1.9 8 





changed his initial ratings the least. Raters “A” and “B”, the two 
management committee members, changed less (10.4 and 10.1 average 
points per job, respectively) from initial ratings to adjusted ratings than 
did the labor committee members (18.6 and 18.3 average points per 
job for Raters “E”’ and “F”’, respectively). 

In Table 6 the average net change per job by raters from initial point 
ratings to adjusted point ratings is shown. These values were obtained 
in the same manner as described above except that algebraic signs were 
considered. In general the trend shown in Table 6 is that Raters “A” 
and ‘‘B’’, the two management raters, and Raters “C’”’ and “D’’, the 
production and maintenance superintendents, initially tended to under- 
rate the jobs except in relation to the “Working Conditions” scale; 
therefore, they had to increase point ratings in the conferences, while 
Raters “E” and “F’’, both labor union members, over-rated jobs on the 
“General Schooling” and ‘‘Working Conditions” scales but under-rated 
on the ‘Learning Period’ and “Job Hazards’’ scales. 

It is interesting to note that the mean of the initial points as conceived 


Table 6 


Average Net Change per Job of Raters from Initial Point Ratings 
to Adjusted Point Ratings 











Total Learning General Working Job 

Rater N Points Period Schooling Conditions Hazards 
A (Mgmt) 43 +6.9 +5.8 +1.8 —-1.3 + 6 
B (Mgmt) 43 +2.6 + .1 +2.6 - 9 + 8 
C (Sup-Maint) 16 +13 +1.2 0 a = J 
D (Sup-Prod) 27 +7.9 +3.8 +3.0 ea +1.2 
E (Labor) 43 —2.0 +5.6 —6.9 - 9 + 2 
F (Labor) 43 +8.3 +9.3 -— 3 —1.3 + 6 














Studies in Job Evaluation: 8 165 


by the evaluation committee is 250.30 while after conference discussion 
the mean of adjusted ratings is 254.63. In analyzing this difference of 
4.33 points, a critical ration of 2.47 was found, indicating the difference 
to be significant at the 2 per cent level of confidence. 


Summary and Conclusions 


Job evaluation data for forty-three jobs from a manufacturing plant 
using an abbreviated evaluation system were analyzed. The job evalua- 
tion committee, including two management members, two employees 
affiliated with the labor union active in the plant, and the maintenance 
superintendent or production superintendent when maintenance or pro- 
duction jobs were being considered, evaluated each job on the four items 
of the abbreviated system. 

Reliability coefficients for the total point ratings and for the individual 
scales were obtained by correlating ratings given each job title on the 
basis of each of the four factors of the evaluation system. Correlations 
were found between the ratings of each rater as paired with every other 
rater and these obtained coefficients were averaged after transformation 
to Fisher Z-values. These average intercorrelations were then stepped- 
up using the Spearman-Brown formula to obtain the estimated reliability 
of the ratings of the five-member committee. 

Comparison was made with a previously published study by Lawshe 
and Wilson which employed the abbreviated evaluation system. Anal- 
lysis was also made comparing independent ratings with ratings adjusted 
after conference discussion. Differences in agreement or consistency of 
ratings with which labor and management committee members rate jobs 
were also explored. 

The following conclusions are supported: 


1. The abbreviated system demonstrates reliability sufficiently high 
for most practical purposes (.98 for five raters). 

2. Comparisons with the Lawshe-Wilson study shows the same rank- 
order pattern of reliabilities for total points and the individual scales. 

3. In comparing relative reliabilities of management and labor com- 
mittee members, those of management were found consistently higher 
(ranging from .80 to .94); than those of labor (ranging from .37 to .83). 
The magnitude of reliability or agreement of average intercorrelations 
of the management and labor union committee members combined (range 
from .66 to .86 for one rater) falls between those of labor union members 
which are of lowest magnitude and those of management representatives 
which are of highest magnitude. 





166 C. H. Lawshe and Patrick C. Farbro 


4. High correlation was found between mean initial ratings (those 
assigned independently) and the mode of adjusted ratings (adjusted 
during conference discussion) for Total Points (.97) and also for the 
component items of the system (ranging from .83 to .92). 

5. Initial ratings as conceived by management, labor, and the super- 
intendents separately when compared with the mode of adjusted ratings 
showed the superintendents to initially rate jobs more accurately as 
evidenced by correlation with adjusted ratings than did management 
committee members or labor union committee members. Management 
members’ initial ratings agreed more closely with final ratings than did 
labor union members’ initial ratings. The same relationship was found 
in analyzing actual point changes from initial to mode of adjusted ratings. 

6. Throughout this analysis the Skill Demands factor as measured by 
“Learning Period” and ‘General Schooling” items was found the most 
stable as evidenced by the fact that average intercorrelations for these 
two scales were largest; that agreement between management and labor 
members on these two scales was greater than on other scales; and that 
correlations between initial and adjusted ratings on these two scales was 
higher than on scales of ‘Job Characteristics” factor. 


Received December 22, 1948. 
Early publication. 


References 


. Lawshe, C. H., Jr., and Satter, G. A. Studies in job evaluat ion. 1. Factor analysis 
of point ratings for hourly paid jobs in three industrial plants. J. appl. Psychol., 
1944, 28, 189-198. 

. Lawshe, C. H., Jr. Studies in job evaluation. 2. The adequacy of abbreviated 
point ratings for hourly-paid jobs in three industrial plants. J. appl. Psychol., 
1945, 29, 177-184. 

. Lawshe, C. H., Jr. Studies in job evaluation. 3. An analysis of point ratings for 
salary paid jobs in an industrial plant. J. appl. Psychol., 1946, 30, 117-128. 

. Lawshe, C. H., Jr., and Alessi, 8S. L. Studies in job evaluation. 4. Analysis of 
another point rating scale for hourly-paid jobs and the adequacy of an abbre- 
viated scale. J. appl. Psychol., 1946, 30, 310-319. 

. Lawshe, C. H., Jr., and Wilson, R. F. Studies in job evaluation. 5. An analysis 
of the factor comparison system as it functions in a paper mill. J. appl. Psychol., 
1946, 35, 426-434. 

. Lawshe, C. H., Jr., and Wilson, R. F. Studies in job evaluation. 6. The relia- 
bility of two point rating systems. J. appl. Psychol., 1947, 31, 355-365. 

. Lawshe, C. H., Jr., Dudek, Edmund E., and Wilson, R. F. Studies in job evalua- 
tion. 7. A factor analysis of two point rating methods of job evaluation. J. 
appl. Psychol., 1948, 32, 118-129. 

. Peters, C. C., and Van Voorhis, W. R. Statistical procedures and their mathematical 
bases. New York: McGraw-Hill Book Co., 1940. 

. Snedecor, G. W. Statistical methods. Ames, Iowa: Iowa State College Press, 1946. 





Odor Selection, Preferences and Identification 


Bernard Locke and Charles H. Grimm 
Brooklyn, N.Y. 


In light of the fact that many millions of dollars are spent annually 
in the purchase of aromatic products it is extremely surprising that so 
little work has been done in any systematic fashion to evaluate some of 
the factors which lead an individual to select a particular aromatic 
compound for purchase. It is the purpose of this paper to explore, in a 
preliminary fashion, several of the factors which might play a part in 
such selection. 

The broad elements to be dealt with in this research include: 1. The 
ability to differentiate between ‘expensive’? and “inexpensive’”’ odors. 
2. The relationship between subjective concepts of costliness and “‘pleas- 
antness” or “unpleasantness” of a perfume compound. 3. The ability to 
recognize some of the more common floral odors. 

The 69 female subjects used were a select rather than a cross section 
sampling in that they were students in an advanced collegiate course in 
psychology and our interpretations of the results will, therefore, take 
this into consideration. The average age of the group was 24.7 years 
with a range from 19 to 50 years. The length of time that these indi- 
viduals had been using perfumes ranged from one to twenty-five years 
with a mean of 7.2 years. 


Experiment 1. The Ability to Differentiate Between 
“Expensive” and “Inexpensive’’ Odors 


A search of the psychological literature for the past five years reveals 
only one experimental exploration of the ability of individuals to dif- 
ferentiate between expensive and inexpensive perfumes. In this ex- 
periment G. M. Jewett! employed three pairs of perfumes each containing 
an inexpensive member (50¢ an ounce) and an expensive one ($8.00 to 
$16.00 per ounce). His subjects were asked to compare them as to 
general “desirability” or affect and ‘“‘lasting quality’? purely on the 
basis of the smell stimulus. Jewett concluded from his data that in 
both respects the inexpensive perfumes produced substantially the same 
results as the expensive. 

1 Jewett, G. M. A note on the relation between subjective estimates of the desira- 
bility and the lasting quality of certain perfumes and their cost. J. gen. Psychol., 
1945, 33, 285-290. 

167 





168 Bernard Locke and Charles H. Grimm 


In the present experiment the 69 subjects were individually given 
perfumers’ blotters that had been dipped into standard strength samples 
(16 oz. of oil to 128 oz. of alcohol) of eight perfumes and asked to indicate 
on a check sheet whether they thought the perfume to be an expensive 
or inexpensive one and at the same time whether they thought it a 
pleasant or unpleasant one. A description of the perfume oils, odor 
types and their costs is as follows. 


Each of the oils has been found to be commercially acceptable and has 
been in use for a period of years. The average cost of the inexpensive oils 
(Numbers 1, 3, 5 and 7) is $5.00 per pound and the average cost of the expen- 
sive compounds (Numbers 2, 4, 6 and 8) is $60.00 per pound. The floral odors 
used were selected for tneir high fidelity in reproducing the actual floral note 
demonstrated in many years of use. Odor No. 1. A heavy sweet, balsamic, 
amber type; 2. A subtle chypre-floral, French, modern bouquet; 3. A modern, 
sweet, resin, aldehyde-chypre type; 4. A modern, floral-spice, fantasy type; 
5. A sweet, modern, trefle, ‘outdoor’ type; 6. A sophisticated, aldehyde- 
floral, fantasy type; 7. A modern, aldehyde, French type; and 8. A heavy, 
sweet, balsamic, amber type. 


Table 1 
Subjective Estimates of Cost of Eight Perfume Samples 
Note: Items Marked with a * Are the “Expensive” Compounds 








Perfume : Per Cent of 
No. Inexpensive Expensive Correct Responses 





1 49 20 71 
2* 26 43 62 
3 41 28 59 
4* 44 25 36 
5 41 28 59 
6* 36 33 48 
7 44 25 64 
8* 39 30 43 





Table 1 presents the selections. The range of correct estimations of 
cost runs from 36 per cent to 71 per cent. If the responses for all eight 
odors are averaged the mean percentage of correct responses is 55, or 
just slightly better than if the selections had been made purely by chance. 
However, if we consider the accuracy of the judgments as regards the 
expensive and inexpensive odors separately we find that 63.3 per cent of 
the subjects made accurate choices of the inexpensive odors as compared 
to 47.25 per cent correct choices for the expensive odors. The computed 
critical ratio is 2.56 indicating that the difference is significant at the 
2 per cent level but not at the 1 per cent level. 

If one considers the direction of the errors made it is found that in 





Odor Selection, Preferences and Identification 169 


38 per cent of the estimations inexpensive perfume compounds were 
classed as ‘‘expensive’’ while 53 per cent of the estimations of the ex- 
pensive compounds categorized them as being “inexpensive.’”’ Thus, 
we note a distinct tendency to minimize rather than to exaggerate the 
“‘values” of the odor samplings. 

The mean number of correct identifications as to relative costliness 
of the eight perfume samples was 4.4. Not one of the 69 individuals 
was able to classify all eight correctly nor did any individual fail to make 
a single correct choice. 

In order to determine whether length of use plays any part in devel- 
oping skill in differentiation betweer. expensive and inexpensive odors 
the group was divided into those who had used perfume from 0 to 5 years 
(N = 32) and those who had been using it for 6 or more years (N = 37). 
A comparison of the number of correct selections of the members of these 
two groups reveals that there is no demonstrable improvement in ability 
to differentiate between the expensive and inexpensive odors with in- 
creasing numbers of years of perfume usage. This is best demonstrated 
by the fact that the average number of appropriate selections for both 
of the groups is exactly identical, namely, 4.4 correct. 

In order to evaluate the role of frequency of use as opposed to length of 
use of perfume in developing the ability to differentiate between expensive 
and inexpensive perfumes the subjects were asked to indicate the fre- 
quency with which they used perfumes. This was done on a four point 
check list which was made up of the following steps: Frequently, Occasion- 
ally, Rarely, Not at all. The need for such an evaluation is best illus- 
trated by the response of the oldest member of the group who, in reporting 
the number of years that she had used perfume, replied, ‘““Once a year 
for twenty-five years.’’ Because of the small size of the experimental 
group the one subject who fell in the “not at all’? category, and who, 
incidentally, made 5 correct selections, has been thrown into the “rarely” 
group. The results indicate that for the present experimental sample 
there is no measurable difference in ability to discriminate expensive 
from inexpensive perfumes among individuals who use perfumes fre- 
quently, occasionally or rarely, the mean number of correct choices being 
4.3, 4.5 and 4.5 respectively. 


Experiment 2. Relationship Between Subjective Concepts of Costliness 
and “Pleasantness” or ‘‘Unpleasantness” of a Perfume Compound 


Since it is fairly common experience that with some individuals com- 
modities can be ‘‘costly” and still “unpleasant”? and vice versa it was 
decided to explore the frequency with which such variations occurred. 





170 Bernard Locke and Charles H. Grimm 


At the time that each of the subjects determined whether a sample was 
expensive or inexpensive she was also asked to indicate whether the odor 
was pleasing or unpleasant to her. Table 2 presents the frequency with 
which each of the eight odors used in Experiment 1 was designated with 
the apparently contradictory adjectives ‘“‘Inexpensive and Pleasant’ or 
“Expensive and Unpleasant.”” From this table we note that a consider- 
able amount of disagreement exists between the individual’s evaluation 
of the cost of each of the perfumes and its pleasantness. This difference 
actually constitutes an average of 31.5 per cent or, virtually, one-third 
of the total number of comparisons made. When tae discrepancies for 
the ‘‘expensive” and “‘inexpensive”’ groups of perfumes are compared no 
difference is found. The mean percentage oi differences is 31.8 per cent 
for the inexpensive odors and 31.3 per cent for the expensive group. 
While there was a slightly greater tendency to attribute unpleasantness 
to odors thought to be costly than to consider as pleasant those com- 
pounds which were thought to be inexpensive, this difference is not 
sufficiently great to be significant. 


Table 2 
Differences in Subjective Concepts of Costliness and Pleasantness or 
Unpleasantness of 8 Perfume Compounds 
Note: Those Perfumes Marked with a * Are Expensive 








Perfume Inexpensive- Expensive- _ Total Total 
No. Pleasant Unpleasant Disagreement Agreement 





13 6 19 50 
15 11 26 43 
10 14 24 45 
14 9 23 46 
12 14 43 
9 10 
5 14 50 
8* 3 16 ¢ 50 





In considering the number of instances in which there was disagree- 
ment between the concepts of costliness and pleasantness for each of the 
individuals we learn again of the disagreement in attitudes between cost 
and pleasantness. The mean number of disagreements for each of the 
individuals in terms of pairing inexpensiveness and pleasantness is 1.2 
and the mean for the expensive-unpleasant pair is 1.5. In one instance 
where the subject classed all of the perfumes as expensive, she also con- 
sidered them all as being unpleasant. 





Odor Selection, Preferences and Identification 171 


Experiment 3. An Investigation of the Ability of a Group of Subjects 
to Recognize Some of the More Common Floral Odors 


This section of the research was intended to examine the ability of the 
same experimental group to identify some of the more common fioral 
odors. The eight odors used were Lilac, Gardenia, Carnation, Rose, 
Pine, Jasmin, Lily of the Valley and Geranium presented in that order. 
Each subject was permitted to smell each of the odors on perfumers’ 
blotters after having been told that each of the odors that she would now 
smell was that of a flower and that she was to identify it by name. Table 
3 shows the number of correct identifications of each of the eight odors. 


Table 3 


Correct Identifications of Eight Floral Odors 








Number of Correct 
Identifications Correct Identification 
Floral Odor (N = 69) in Percentages 





Lilac 35 
Gardenia 33 
Carnation 30 
Rose 25 
Pine 41 
Jasmin 1 
Lily of the Valley 23 
Geranium 0 





Examination of Table 3 shows that the range of correct identifications 
of the floral odors used ranges from 0 (for gernaium) to 28 (for pine) or 
from 0 to 41 per cent of correct identifications. If one averages the 
correct responses for all of the eight odors the resultant percentage of 
correct responses is 23.5. The apparent order of difficulty in identifica- 
tion ranging from most difficult to least difficult is Gernaium, Jasmin, Lily 
of the Valley, Rose, Carnation, Gardenia, Lilac and Pine. While it is 
somewhat surprising that so much difficulty was evidenced in identifying 
the various odors it is particularly interesting that the rose which is so 
common and popular to our culture caused so much difficulty in recog- 
nition with only one out of every four subjects being able to identify it 
correctly. 

Table 4 presents the findings for the number of correct identifications 
by each member of our experimental group. This table reveals that 12 
per cent of our subjects were unable to identify even one of the floral 
odors used and, similarly, there was no individual who identified more 
than four of the eight floral odors that we used. 





Bernard Locke and Charles H. Grimm 


Table 4 


Correct Identifications of the Series of Eight Floral Odors 








Number of Correct 
Identifications (N = 69) 


Number of Individuals 


Per Cent of 
Total Group 





0 8 
22 

17 

16 

6 

0 

0 

0 

0 


Mean = 1.8 


12 
31 
25 


100 





To illustrate the wide deviations in identification made by members 
of the group Table 5 presents the identities attributed to our samples of 
Rose and Carnation. 

In order to determine the effect of knowledge of the identity of the 
floral odors under investigation upon the accuracy of the identifications 
one-half of the experimental group (35 subjects) was asked to repeat 
this portion of the experiment but this time they were given the names 


of the odors presented in random order. 


Table 5 


Identifications of Rose and Carnation Samples made by the 69 Subjects 


Table 6 presents a comparison 








Rose 


Carnation 





Don’t Know 27 

Rose 17 

Lily, Lily of the Valley, 
Easter Lily 

Gardenia 

Lilac 

Sweet Pea 

Jasmin 

Bouquet 

Cold Cream , 

Baby’s Breath 

Orange 

Lemon Verbena 

Geranium 

Carnation 


Ble ee tN 


Don’t Know 
Carnation 
Gardenia 
Geranium 
Jasmin 

Spice 

Rose 

Orange Blossom 
Chrysanthemum 
Mint 

Lavender 

Musk Blossom 
Clover 


two bo 
— 


—— OD wD OP OI 








Odor Selection, Preferences and Identification 


Table 6 


Comparison of Accuracy of Identification of Floral Odors with and without 
Knowledge of Their Identities 








Correct ee Correct Responses 
with Knowledge without Knowledge 
Floral Odor of Identities of Identities 





Lilac 57% 35% 
Gardenia 46% 33% 
Carnation 54% 30% 
Rose 23% 25% 
Pine 94% 41% 
Jasmin 20% 1% 
Lily of the Valley 40% 23% 
Geranium 20% 0% 


Mean 44% Mean 23% 





of the findings for this group and the original group. Examination of 
this table reveals a rather marked improvement in identifications in all 
of the odors except rose and this for some undetermined reason shows a 
minor decline. The average improvement for the eight odors combined 
is 21 per cent but the range is wide since it runs from—2 per cent (for 
Rose) to +53 per cent (for Pine). 

When one considers the contrast between the number of correct 
selections made by each of the subjects before and after the identities of 
the floral odors were given, one finds that while the mean number of 
correct responses has advanced from 1.8 to 3.5, there is still considerable 
room for improvement. It is interesting to note that while none of the 
69 subjects was able to identify more than four of the odors prior to their 
identities having been made known, 10 of the 35 subjects were able to do 
so after the list was made available. Two of the 35 were able to identify 
all eight samples correctly. 


Summary and Conclusions 


Employing 69 college students as the experimental group an attempt 
has been made to evaluate some of the factors that play a part in odor 
preferences and identifications. The results obtained are not intended 
to indicate universal trends, since a select group was used, but they do 
point to the need for further investigation in this area 


1. For the experimental group used the ability to recognize the 
difference between expensive and inexpensive perfume compounds was 
only slightly better than chance, with the mean percentage of correct 
responses being 55. 





4 
Fn 
. 
3 
8 
a 


174 Bernard Locke and Charles H. Grimm 


2. There was a greater tendency to select expensive perfumes as being 
inexpensive than vice versa. 

3. Length of use of perfumes apparently does not affect the ability 
to make accurate judgments as to the costliness of perfume compounds. 

4. Frequency of use does not affect the ability to make accurate 
judgments as to the costliness of perfume compounds. 

5. There is considerable disagreement between the individual’s evalua- 
tion of the cost of a perfume and its “‘pleasantness.”’ There was a slightly 
greater tendency to attribute unpleasantness to odors thought to be 
costly than to consider as pleasant those compounds which were thought 
to be inexpensive. 

G. Utilizing eight common floral odors it was found that our experi- 
mental group was able to identify them with less than 25 per cent accuracy 
(23.5 per cent correct). 

7. When 35 subjects were informed as to what eight floral odors were 
being utilized their accuracy in identification rose to but 44 per cent. 
Received October 25, 1948. 

Early publication. 





Prediction of Female Readership of Magazine Articles * 


Evelyn Perloff 
Ohio State University 


This is the second of two studies attempting to predict the number 
of individuals that will read a magazine article, prior to its publication. 
The first study discussed the prediction of male readership of articles in 
The Saturday Evening Post. The purpose of the current study was to 
determine the way in which five variables combined for maximum female 
readership of articles in the Post. The reader who desires complete 
details of the procedure used in these studies will find an account in the 
reference cited below.! 

The readership results of men and women were handled separately 
on the assumption that interest patterns for magazine articles are well 
defined according to sex. A comparison of the readership figures of Post 
articles for males and females in these studies and many others will clearly 
illustrate the varying interests and preferences of the two sexes.? 

Inasmuch as starting readership is based upon information obtained 
from individual reports from respondents, it is essential to have some 
measure of the accuracy of these reports. Ludeke and Inglis compared 
the results of what readers of the Ladies’ Home Journal stated they had 
read with what they were observed to have read. The results of this 
informative experiment showed an average difference of 1.7% between 
the two conditions, which seems to justify the conclusion that “reported 
reading behavior did not differ materially from active reading behavior.” * 
It is likely that similar results would be obtained with The Saturday 
Evening Post. At present, the reliability of the criterion probably lies 
within an error range of 8% (20 value).‘ 


* This study was conducted while the writer was a research associate in the Develop- 
ment Division of the Research Department, Curtis Publishing Company. 

1 Perloff, E. Prediction of male readership of magazine articles. J. appl. Psychol., 
1948, 32, 663-674. 

2 Waples, D., and Tyler, R. W. What people want to read about. Chicago: Univer- 
sity of Chicago Press, 1931, and unpublished studies, The Curtis Publishing Company, 
Philadelphia, Pa. 

3 Ludeke, H. C., and Inglis, R. A. A technique for validating interviewing methods 
in reader research. Sociometry, 1942, 5, 109-122. 

‘Blankenship, A. B. Consumer and opinion research. New York: Harper and 
Brothers, 1943, Appendix, Table 2. 


175 





Evelyn Perloff 


Results 


The five variables used were number of illustrations, color of illustra- 
tions, sex of persons in the illustrations, proportion of opening page(s) 
devoted to text, and subject matter of the article. The findings will be 
presented in three sections: (1) The Distributions, (2) The Determination 
of the Composite Effect, and (3) The Cross-validation. 

The Distributions. All starting readership per cents are indexes and 
not actual figures. 

The relationship (r = .35) of number of illustrations to starting reader- 
ship per cent indicated on the face of it that the number of illustrations 
significantly influenced the female reader in starting an article. It was 
apparent that there wére no clear-cut breaks in the distributions, as was 
present in the study on male readership. Although there was a slight 
upward trend in starting readership from articles having no illustrations 
to those having eight or more, this trend was not very distinct. It was 
clear, however, that female Post readers preferred articles with many 
illustrations as compared to those with no illustrations. Both men and 
women were equally influenced by this variable (r’s = .35) when its 
effect on starting readership was determined, but all other variables 
were permitted to vary. 

There appeared to be a clear-cut relationship (r = .42) between the 
color of illustrations and starting the article. There were two definite 
breaks for the four categories in this variable. Thus, there were sharp 
changes from ‘‘other’’ to the two categories, black and white and duotone; 
and from black and white and duotone to full-color. It was clear that 
the women in this study did not differentiate between black and white 
and duotone but keenly preferred articles having full-color illustrations. 
The color of illustrations seemed to be of greater importance to women 
(r = .42) than to men (r = .28) in influencing them to start reading 
articles in The Saturday Evening Post. 

The relationship (r = .38) between sex of persons in the illustrations 
and starting readership also appeared to influence significantly the 
starting readership of Post articles by women readers. Apparently, the 
woman reader of the Post preferred any type of illustration other than 
that showing only men. The woman reader preferred illustrations in- 
cluding both males and females to illustrations including females alone, 
but this preference was slight. Again, female readers seemed to be more 
influenced by sex of persons in illustrations than male readers (r = .22). 

There appeared to be a significant inverse relationship (r = — .36) 
between proportion of opening page(s) devoted to text and how many 
women would start to read an article. It was apparent that devoting 





Female Readership of Magazine Articles 177 


less than 20 per cent of the opening page(s) to text resulted in the highest 
starting readership. The differences among the three classes (i.e. classes 
in terms of amount of space devoted to text) were clear-cut and more 
distinct than in the male readership study. The general trend was for 
starting readership to improve as the per cent of text on the opening 
page(s) decreased. 

There was greater variation among the classes of the subject matter 
variable than of any other. A number of the categories had too few 
cases to merit consideration as a separate class. This eliminated various 
classes which are part of the gamut of subjects upon which Post articles 
are written. These articles were classified under the category, ““Other.”’ 
It was found, however, after completion of the male readership study, 
that the category, “Other,” could be further broken down into eight 
additional subject matter categories, making a total of 24 classes in the 
subject matter variable as compared to 16 categories in the previous 
study. 

By this revision, the correlation coefficient between subject matter 
and starting readership per cent was raised from .46 to .60, both coeffi- 
cients indicating clearly that the subject matter of an article considerably 
influenced the female reader to start it. It was apparent that the women 
(as well as the men) who read The Saturday Evening Post have definite 
likes and dislikes of Post topics. Although there was a steady increase 
in starting readership from topics least liked to those best liked, there 
were also several sharp changes grouping together both similar levels of 
preferences and similar kinds of subject matter. The general trend was 
for female starting readership to improve significantly when Post articles 
dealt with topics such as people at work, descriptions of peoples and 
places (USA), and health and hygiene. These topics revealed a pref- 
erence by female readers for human-interest articles. Action-type 
articles such as those on sports, athletes, and labor, which rated among 
the highest with male Post readers, offered less attraction to the women 
readers. 

The Determination of the Composite Effect. The correlation matrix 
is shown in Table 1. The horizontal and vertical headings indicate the 
five variables used in the study. Number of illustrations gave the lowest 
correlation (r = .35) with starting readership per cent, while the coeffi- 
cient between subject matter and starting readership was the highest 
(r = .60). 

Each of the five variables correlated higher with the criterion (starting 
readership) for female readers of the Post than for male readers. The 
writer is unable to say at this time whether this fact suggests that women, 








Evelyn Perloff 


Table 1 


Intercorrelations Between Variables 1-5 and of Starting Readership Per Cent 
(N = 190) 








Starting No. Color Sex % Text on 
Reader- of of of Opening Subject 
Variable ship % Illus. Tilus. Persons Page(s) Matter 





Starting 

Readership % 35 42 38 — .36 .60 
No. of Illus. 35 — .67 11 — .29 19 
Color of Illus. A .67 -- 15 — .46 33 
Sex of Persons i LE 15 — —.16 
Per Cent Text on 

Opening Page(s) —.36 — .29 — 46 —.16 = 
Subject Matter .60 19 33 25 — .25 





by and large, are more impressionable than men, or more consistent in 
their interests, or were more greatly influenced by the particular variables 
used in the study, or whether this increased relationship (over male 
readership) resulted from the author’s coding. It is probably safe, how- 
ever, to conclude from the data in both this and the male study that 
there is a significant sex difference in the readership habits of Post articles. 

For prediction purposes the regression equation was computed. 
Table 2 shows the weights that each variable obtained. These weights 
are an approximation of the relative independent value of each variable 
to the success of the article (starting readership per cent). Use of this 
regression equation yielded an R of .70. The standard error of estimate 
was 9.7 per cent. Hence, the chances are that in about 68 out of 100 
cases the predicted starting readership per cents will be within an error 
of 10 points or less. We may be certain that very few starting readership 
estimates will be in error by more than 30 per cent. 

Calculation of the coded score weights (weights dependent upon the 


Table 2 
Weights of Five Variables for Predicting Starting Readership Per Cent 
(N = 190) (R = .70) 








Variables Weight 





Subject Matter 45 
Sex of Persons in Illustrations .22 
Number of Illustrations 15 
Proportion of Opening Page(s) Devoted to Text —.14 
Color of Illustrations .07 








Female Readership of Magazine Articles 179 


measuring scale of the specific variable) gave the necessary data for the 
regression equation. The final equation is as follows: 


Predicted Starting Readership Per Cent Index 
= 14.9 (Index) + 1.9 X class value (No. of Illus.) 
+ 2.3 X class value (Color of Illus.) 
+ 6.1 X class value (Sex of Persons in Illus.) — 3.1 X class value 
(Proportion of Opening Page[s] Devoted to Text) + 4.2 X class 
value (Subject Matter). 


Inasmuch as the correlation coefficients of four variables (not including 
subject matter) and the resulting multiple, .70, could be higher for pre- 
dictive purposes, it is believed that there are other variables which pos- 
sibly are of greater importance for prediction than the ones under present 
analysis. This is particularly evident from the fact that the multiple 
correlation coefficient was raised only .10 when these four variables were 
considered along with the subject matter variable. In view, however, 
of the paucity of information (i.e. other variables) and perhaps the 
difficulty of measuring them, consideration of the present variables, in- 
dividually but more requisitely all together, can make noticeable im- 
provement in predicting starting readership. 

The Cross-validation. To determine the extent to which the weights 
of the characteristics of articles would be valuable in years other than the 
year 1946, when the articles included in this study appeared, we have 
applied this regression equation to 149 articles appearing in the 1947 
issues of the Post. The correlation between the actual and predicted 
starting readership per cents was .73. This validity coefficient was 
slightly higher than the multiple (R = .70) and a reversal of the lower 
validity coefficient (r = .36) and the multiple (R = .56) obtained in 
the male readership study. 

The higher correlation predictions for women readers in this later 
year probably result from a constancy in interests over the period of the 
year intervening. The increased number of classes in the subject matter 
variable may also account for the slightly higher validity correlation 
coefficient. 

The average difference between the actual and predicted starting 
readership per cents was 8.6 per cent. The predicted starting reader- 
ship per cents were within 5 per cent of the actual starting readership in 
41 per cent of the articles, within 10 per cent in 68 per cent of the articles, 
and within 15 per cent in 86 per cent of the articles. 


The Applications 


The applications of this study are identical to those discussed in the 
study on male readership. The primary application lies in checking the 








: 
{ 
: 
: 
’ 


180 Evelyn Perloff 


value of a tentative layout for an article and making such layout changes 
as are necessary to increase the average readership of each issue of the 
magazine. Inasmuch as weights may change with time, continued 
follow-up is essential. 


Conclusions 
The following conclusions are supported: 


1. The multiple correlation and regression technique proved to be a 
successful method for predicting starting readership of Post articles by 
female readers. 

2. The accuracy of the predictions of future articles should fall within 
a 10 per cent difference between predicted and actual starting readership 
per cents in about 68 per cent of the cases. This percentage error is 
satisfactory for most practical purposes. 

3. The order of the relative importance of the five variables included 
in this study is (a) subject matter; (b) sex of persons in illustrations; (c) 
number of illustrations; (d) proportion of opening page(s) devoted to text; 
and (e) color of illustrations. 


Received August 31, 1948. 





Special Review 


Buros, Oscar Krisen. The Third Mental Measurements Yearbook. New 
Brunswick, N. J.: Rutgers University Press, 1949. Pp. xv, 1047. 
$12.50. 


Colossal is the word for this sixth and latest offering in the familiar 
series of bibliographical works on mental measurements edited by Buros. 
Starting with a modest 44-page listing of tests in 1935, the phenomenal 
growth of the series is charted in the following healthy chronology: 


1935—Educational, Psychological, and Personality tests of 1933 
and 1934—44 pages 

1936—Educational, Psychological, and Personality Tests of 1933, 
1934, and 1935—83 pages 

1937—Educational, Psychological, and Personality Tests of 1936— 
141 pages 

1938—The Nineteen Thirty-Eight Mental Measurements Year- 
book—415 pages 

1941—The Nineteen Forty Mental Measurements Yearbook— 
674 pages 

1949—The Third Mental Measurements Yearbook—1047 pages 


As indicated, the rate of increase in volume has been tremendous and as 
yet shows no clear change in trend. One is reminded of the fable of the 
sorcerer’s apprentice; and one hopes that Buros has his wonderful 
editorial legerdemain under better control. 

The Third Yearbook follows the familiar pattern of the earlier models. 
There are two main sections: ‘‘Tests and Reviews” and ‘“‘Books and 
Reviews.” The first of these, comprising over two-thirds of the volume, 
is the section for which the Yearbooks are best known, and on which 
their reputation is founded. The comparative statistics for the ‘Tests 
and Reviews” sections of the three Yearbooks are shown in the accom- 
panying table. 


Comparative Statistics of the Three Mental Measurements Yearbooks 


1938 1940 Third 
Period Covered Jan. 1937—June 1938 July 1938-Oct. 1940 Oct. 1940—-Dec. 1947 
Entries 313 503 705 
No. of Reviews 331 503 713 
No. of Reviewers 133 250 320 








182 Special Review 


It was originally planned that new editions of the Yearbook would 
be issued at two-year intervals, but this schedule was interrupted by the 
war. The first post-war model, therefore, covers a period of some seven 
years—a fact which, in part at least, accounts for its gargantuan dimen- 
sions and excuses its minor sins of omission. The “Tests and Reviews” 
section of the Third Yearbook lists 663 tests (plus 42 references to books 
about single tests, e.g. the Rorschach) of which about 70 per cent are 
accompanied by one or more original reviews. Altogether, 713 reviews 
are contributed by 320 psychologists, educationists, subject-matter ex- 
perts, classroom teachers, and test technicians. Included also are 66 
excerpts from reviews which have already appeared eleswhere and (as 
claimed in the preface—this reviewer did not count them) ‘3,368 refer- 
ences on the construction, validity, use, and limitations of specific tests.’ 

The tests listed and reviewed purport to be all of the “commercially 
available tests—educational, psychological, and vocational—published 
as separates in English-speaking countries between October 1940 and 
December 1947.” In addition are included a selected list of “classics” 
(e.g. the Army Alpha, Stenquist Mechanical Aptitude, Strong Vocational 
Interest Inventory, etc.) plus a few tests published during the 15 years 
since Buros first started fishing in these waters, but which somehow ‘“‘got 
away” before. 

For each test entry, the following useful information is provided, all 


condensed into a few lines: test title; description of groups for which 
intended; date of publication, copyright, or revision; whether or not 
machine scorable; whether individual or group test; forms, parts, and 
levels available; cost; testing time and total administration time; author; 
and publisher. A specimen entry is: 


American Council on Education Psychological Examination for High 

School Students. Grades 9-12; 1933-1947; new form issued annually; 

IBM; separate answer sheets must be used; $2.00 per 25 tests; 50¢ per 

25 machine-scorable answer sheets; 50¢ per specimen set; 35 (65) minutes; 

L. L. Thurstone and Thelma Gwinn Thurstone; Educational Testing 

Service. 

Following this outline, for most test entries, are cross references to 
earlier Yearbooks or bibliographies in the series, and references to books 
and articles covering some aspect of the test. Then comes the feature 
which the series is best known—the original review. About a third of 
the entries include reviews by more than one contributor. But more 
about the reviews later. 

The second main section of the volume—the “Books and Reviews” 
section—lists ‘549 books on measurements and closely related fields,” 
accompanied in most cases by excerpts from reviews of these books 
culled from the journals. Here again the attempt has been to include 





Special Review 183 


all works in the area published in English-speaking countries between 
October 1940 and December 1947. And here, as throughout the book, 
the emphasis is on critical evaluation. The editor’s preface states in 
this connection: 


“Reviews which included no critical comment are listed but not excerpted. 
Readers should note that the critical portions of all book reviews, regard- 
less of merit, found in professional and scholarly journals are included in 
this yee irbook. Asterisks and ellipses within excerpts indicate the omission 
of non-evaluative material which appeared in the original review. 


The selection of this material was presumably the sole responsibility 
of the editor. Inclusion of all the words written about all of the books 
listed would obviously have been both impossible and ridiculous. On 
the other hand, wherever judgment must be employed in the selection and 
editing of material, one is entitled to ask questions about the basis of the 
selection, and to be suspicious of possible conscious or unconscious bias. 
In this case, the editor assures us that all the critical appraisals of the 
books listed, collected from all the reviews of these books appearing in 
the professional and scholarly journals, have been included, and only 
the purely descriptive or non-evaluative reviews or parts of reviews have 
been left out. This being true, the reader might conclude that Buros’ 
own Nineteen Forty Mental Measurements Yearbook, since it accounts for 
50 reviews in 1514 pages, was either the most important or the most 
criticized—surely the most controversial—book on the subject published 
since 1941. Sharers of the honor of evoking the most critical comment 
would be the two volumes of Diagnostic Psychological Testing by Rapaport 
et al. (13 reviews, 1614 pages) and Stoddard’s The Meaning of Intelligence 
(13 reviews, 10 pages). 

In addition to the two major review sections covering tests and books, 
the volume contains five directories and indices. The first of these, 
Periodical Directory and Index, serves both as a key to the abbreviations 
used throughout in journal references and as a directory of journal editors. 
The second, Publishers Directory and Index, gives addresses of test and 
book publishers. The /ndex of Titles and the Index of Names are con- 
ventional alphabetical listings. Finally, the Classified Index of Tests is 
an expanded table of contents for the ‘‘Tests and Reviews”’ section, 
listing each entry numerically (1-705). 

The reputation of the earlier Yearbooks derived principally from the 
“Tests and Reviews’”’ sections; and the same will doubtless be true of the 
latest volume in the series. The major rubrics are essentially the same 
as those employed in the earlier issues; Achievement Batteries (22 
entries); Character and Personality (91 entries); English (57 entries); 
Fine Arts (7 entries); Foreign Languages (36 entries); Intelligence (89 








184 Special Review 


entries); Mathematics (62 entries); Miscellaneous, e.g. home economics, 
safety, computational and scoring devices (111 entries); Reading (70 
entries); Science (44 entries); Social Studies (30 entries); and Vocations 
(86 entries). It is difficult to know what significance, if any, to attach to 
the numbers of entires in each category. Perhaps they illustrate the 
difficulties of fitting fabricated classifications to any given series of data, 
particularly when the data present themselves without regard to the 
principles on which the classification was originally compounded. When 
this occurs, the pattern must be stretched here and there, and the miscel- 
laneous section originally provided for overflow inevitably grows bigger 
and bigger. In spite of this—and assuming the listings are as compre- 
hensive as claimed—it appears that the war and post-war periods have 
provided an atmosphere more congenial to production in the field of 
character and personality than elsewhere. This conclusion might be 
somewhat misleading, however, since one test alone accounts for nearly 
a quarter of all the entries under this heading. Needless to say, it is the 
Rorschach which somehow merits 67 pages, including a bibliography of 
598 titles! 

The original reviews themselves appear to this reviewer as a varied 
lot having only one factor in common—all are critical. Criticism is, in 
fact, the dominant tone of the whole volume (nil nisi bonum is definitely 
not the editor’s watchword!) and while going through it page by page 
one may conjure up an image of the editor at the head of his ranks of 
contributors daring the would-be test maker to attempt to get away 
with anything shoddy or unscrupulous. The image is an inspiring one, 
and, though fanciful, not too remote from the editor’s intention. One 
of the major objectives of the Yearbooks, in fact, is: 

“To impel authors and publishers to place fewer but better tests on the 

market and to provide test users with detailed and accurate information 


on the construction, validation, uses, and limitations of their tests at the 
time that they are first placed on the market.” 


To achieve this objective, the editor has instructed his cooperating 
reviewers to provide reviews that are “. . . frankly critical with both 


strengths and weaknesses pointed out in a judicious manner.” Just 
how “judicious” are such randomly selected remarks as: 


“This is just another test for neurotic tendencies. The reviewer can 
think of no reason for its publication or use. . . . The only excuse for 
publishing another test of neurotic tendency in this day and age is in- 
creased validity over other tests in the field. This test is grossly lacking 
in this respect. 


With the perspective attained in the years since its publication . . . one 
may view (test maker’s) arrant nonsense with tolerant amusement.” 





Special Review 


or, again: 


“The instrument is a reversion to a type of psychological and sensory 
testing that belongs to the infancy of mental measurement, and has 
repeatedly been proved worthless as an index to higher mental ability.” 


the reader must judge for himself. If such candid criticisms are indeed 
warranted, the reviewers deserve full praise for their courage is saying 
so. Fortunately, a good many of the reviewers have displayed this kind 
of forthright frankness. Certainly, this reviewer is not advocating 
libelous brutality for the sheer sadistic enjoyment of contemplating the 
discomfiture brought about by a well-placed literary needle. But it is 
difficult to discern what value might accrue to the potential test user from 
that type of review which finds a little to praise and a little to blame in 
every test and sums up with an equivocal statement of possible usefulness 
as a crutch for intuitive hunches. Fortunaltely, this sort of review is 
not found very often in the Yearbook. 

But while the general tone of the book is healthily evaluative, the 
basis of criticism varies considerably among the reviews. They might 
well be classified under headings suggested in an excellent review of the 
1940 Yearbook (Pedro T. Orata in The Teachers College Journal (Manila), 
1941, 3, 59-61): 

1. Emphasizes functional or ‘‘true”’ validity, criticizes test for success or 
failure to measure ultimate educational objectives . . . in measuring 
worthwhile results of instruction; and in general subordinates the tech- 
niques of test construction and mechanics of administration, scoring, 
and tabulation of scores to the higher values that the test and testing 
should engender in the pupils and in those who use it. 

. . . criticizes the test mainly for success or failure to meet the require- 
ments of statistical validity and reliability, evaluates it on the basis of 
commonly accepted techniques of test construction, and in general 
assumes functional validity or subordinates it to formal content and 
make-up of the test. 

. evaluates it mainly from the point of view of its success or failure 


to meet the mechanical requirements of efficiency in scoring, adminis- 
tration, and tabulation of test results. 


Each of these three points of view has merit, to be sure, and each could 
doubtless rally a considerable number of supporters to its side. In fact, 
it is probably more to the point to classify reviewers in this manner than 
to classify their products. And herein lies the fundamental weakness of 
the Yearbook in the opinion of this reviewer. Granting that all of the 
contributors to the volume are experts and qualified to speak and be 
heard, they are not all equally sensitized to all phases of psychometry. 
In diagnosing human ailments, we don’t call in only the internist or the 
neurologist. Nor should we, in examining the test, expect the subject- 
matter expert to detect statistical ailments, or the psychometrist to point 
up an undernourished teaching objective. We should call in all the 








186 Special Review 


specialists and hold a thorough clinical examination on each case. That 
something like this was intended is indicated by the editor’s statement of 
objectives in the 1940 Yearbook: to provide reviews “‘written by persons 
of outstanding ability representing various viewpoints. ...” But in 
calling in the experts, the editor has made the assignments of cases, which 
implies that he already knew the patients’ needs. Though some such 
procedure is a practical necessity in a venture of this kind, it has the 
definite disadvantage that the treatment of the various types of tests is 
apt to be unbalanced. A cursory survey, for example, indicates that 
nearly all the reviewers assigned to the Achievement Batteries are educa- 
tionists, educational researchers or examiners, and that psychologists and 
psychometrists predominate among the reviewers of Intelligence tests. 

After many hours spent in contemplating the somewhat frightening 
aspect of the Third Yearbook, this reviewer found himself musing about 
the practicability of another kind of volume. This “dream” book would 
not attempt to reproduce verbatim the literary efforts of the experts, 
but would edit and cull from them all the essential materials to fill out 
a standard outline. Spared the necessity of literary composition, re- 
viewers could concentrate on specifics, and could handle more tests with 
no greater expenditure of effort. The outline itself would be drawn up by 
a board of outstanding specialists including both test makers and test 
users. It would cover such points as: type of item; sources of items; 
nature of item analysis; descriptions of populations used for item analysis, 
factorial analysis, validation, cross-validation, standardization; judg- 
ments of functional validity; adequacy of “‘coverage’’; et cetera, et cetera. 
This list is obviously not exhaustive, and many readers may detect a 
statistical bias in it. It is for precisely this reason that the board of 
experts would be used to insure the inclusion of all important dimensions 
of a test. Finally, there would be the main feature of the book—the 
board of experts’ “‘seal of approval” for tests which merited adoption 
and use. In this last connection, the reviewer is reminded of the state- 
ment made by Sandiford in commenting on the earlier Educational, 
Psychological, and Personality Tests of 1936 (American Journal of Psy- 
chology, 1938, 51, 200): “. . . Professor Buros’ annual publication 
would be made much more useful if he would mark with a prominent 
star those (tests) which were valid, reliable, and had satisfactory norms. 
Then busy workers could neglect the rest, or if they wasted their money 
on ‘gold bricks,’ the fault would be their own.”” This reviewer can think 
of no better way of achieving the objectives of fewer and better tests. 


E. Donald Sisson 
Personnel Research Section, AGO, 
Department of the Army. 





Book Reviews 


Ahern, Eileen. Survey of personnel practices in unionized offices. Re- 
search report number 13. New York: American Management As- 
sociation. 1948. Pp 38. $1.50 (non-members, $3.00). 


This report consists of twenty frequency tables and accompanying 
text relating to practices in unionized offices in matters of union security, 
salaries, hours of work, leave of absence, group insurance, seniority, 
discharge, grievance adjustment, and other collective bargaining subjects. 
The report is based on 50 union contracts believed to be fairly repre- 
sentative of the entire AMA collection of 300 office union contracts. 

The report will be of interest to only a few psychologists. Those 
who are concerned with collective bargaining with office unions or those 
who wish to compare their practices with those obtained by employees 
through collective bargaining will find the report of some interest subject 
to the limitations imposed by a sample of 50 cases and sub-group tabu- 
lations based on an N ranging from five to eight. 


C. E. Jurgensen 


Minneapolis Gas Company 


Achilles, Paul 8. Management and the psychologist: A practical guide on 
psychology for the business executive. Section II, Book 4, Reading 
Course in Executive Technique, Ed. by Carl Heyel. New York: 
Funk and Wagnalls Co., 1948. Pp. 64. $1.00. 


The sub-title is an exact description of the contents of this little book. 
The presentation is concise yet it is surprisingly comprehensive. It is 
authoritative, readable and accomplishes its purpose in admirable fashion. 
It is just the type of book to place in the hands of the business executive 
who has never been exposed to formal psychology but who may be 
curious as to just what our discipline is all about. 

Only one minor criticism would appear to be justified. Having 
whetted the appetite of the business executive, Achilles might well have 
added a short selected annotated bibliography for his guidance in case he 
might desire to pursue further any phase of the subject. 

Donald G. Paterson 


The University of Minnesota 


Linebarger, Paul M. A. Psychological Warfare. Washington, Infantry 
Journal, 1948. Pp. 259. $3.50. 


The purpose of this book is to tell a layman audience what psycholog- 
ical warfare is and how it is fought. Linebarger is Professor of Asiatic 
187 








188 Book Reviews 


Politics at the Graduate School of Advanced International Studies in 
Washington, D. C. He served in the War Department and in OWI in 
both policy formulation and in field operations. 

The book handles psychological warfare in three parts. Each part 
has three to six chapters. In the first part, Linebarger covers historical 
examples, definitions, limitations and characterizations of national uses of 
psychological warfare in World Wars I and II. The second part is 
devoted to how to analyze and derive military intelligence from propa- 
ganda in order to make an objective appraisal of a given situation in terms 
of psychological warfare. The third phase includes organization, plans, 
operations, and remarks on future problems. 

The strong point of the book is the waggish style. This is exem- 
plified when he pokes fun at the high level policy echelons wherein much 
of the output was classified top-secret and thus removed from usefulness. 
There are seventy excellent figures of propaganda leaflets as well as ten 
organizational charts of various national offices involved in psychological 
warfare. The content is enlivened with descriptions of events such as 
the use of radio-phones in tank warfare to induce Japanese surrenders. 
The three major U. S. lessons from World War II are, he says, that 
atrocity propaganda does not pay, that we have no backlog of trained 
propaganda personnel, and that psychological warfare must be a positive 
function at command level, not a sideline specialty apart from top level 
policy making. 

A weak point of this book is its lack of organization despite the 
promise of the excellent chapter headings. Specific techniques, the root of 
the entire matter from a professional view, are mentioned as the story de- 
velops. They are not consolidated for a comparative analysis of their uses 
and limitations. There is an unusual mixture of the levels of vocabulary. 
Such words as condign, maleficent, and oestrous occur as well as frequent 
references to people going mad with confusion and serious use of Frisco 
for San Francisco. Use of the revised Flesch formulas show readability 
as difficult and style as mildly interesting. Linebarger epitomizes and 
tends to rest content with neatly turned phrases. For example, he makes 
the point that education is to psychological warfare what a glacier is to 
an avalanche. He neglects to show the crucial differences in bias, in 
use of segmental appeals, and in emotional and authoritarian contexts. 
Professional psychologists may wonder if his two page discussion of the 
role of the psychologist in warfare justifies use of “psychological” in the 
title. The location of the eighty illustrations with reference to the text 
might have been improved. 

In summary, Linebarger’s book presents ‘‘a patchwork of enthu- 
siastic recollection” as he calls it. Although some professional readers 





Book Reviews 189 


may be disappointed, the fact that it is a lively entry in a relatively 
undeveloped field makes the book worthwhile for his intended audience. 


Clark L. Hosmer 
Lt. Col. U. S. Air Force 


Terman, L. M., and Oden, Melita H. The gifted child grows up: Twenty- 
five years’ follow-up of a superior group. Stanford, California: Stan- 
ford Univ. Press, 1947. Pp. xiv, 448. $6.00. 


As stated in the preface, the volume “‘is an over-all report of the work 
done with the California group of gifted subjects from 1921 to 1946, the 
greater part of it being devoted to a summary of the follow-up data ob- 
tained in 1940 and 1945; at the latter date the average age of the group 
was approximately thirty-five years.” 

The first six chapters are a resume of the earlier work, reported in 
more detail in two previous monographs. When selected in 1921-3, the 
1050 pre-high-school subjects had an average chronological age of 9.7 
years, and the 420 high school cases, 15.2 years; I. Q.’s ranged from 135 
to 200 with a mean of about 150. It was estimated that the group was 
in the highest one per cent in ability as measured. Thirty-one per cent 
of the fathers were professional men; 60% of the homes were rated as 
superior; relatives included many individuals of note. In 1923, thirty- 
seven anthropometric measurements of 59 per cent of the cases showed 
that “in all respects . . . the selected group was slightly superior physi- 
cally to the various groups used for comparison.”’ Health histories and 
medical examinations showed health to be better and defects less common 
as compared with the average child; puberty tended to be reached a little 
earlier. In school 85 per cent were accelerated in grade placement; 
nevertheless, tests showed over half to have mastery of subject matter 
two grades yet further ahead. Interests of the gifted were livelier, more 
mature, more intellectual, and somewhat more social than for average 
children. Tests and ratings of character traits also showed superiority 
of the gifted. A second survey six years later yielded results substantially 
in agreement with the first findings. 

Chapters 7-19 are concerned with follow-ups in 1940 by inquiry forms 
and field workers where possible, and by inquiry forms in 1945-6. That 
cooperation was outstanding is indicated by returns of information from 
93 per cent of all living subjects. Mortality was to date found less than 
for the general population, physique and health superior, and maladjust- 
ment, delinquency and insanity less frequent than in the general popula- 
tion. Of the total group, 70 per cent of the men and 67 per cent of the 
women had by 1945 graduated from college (as compared with 5 per 
cent of the general population) ; 34 per cent of the men took one or more 








190 Book Reviews 


graduate degrees; academic records were superior, median age of gradua- 
tion was over a year younger than usual. Nevertheless, gifted students 
participated more in extra-curricular activities than the average student. 
Approximately 71 per cent of the men were in professional or superior 
business occupations in 1940, or 5 times as many as for California men in 
general; income was higher than for college graduates in general. Avoca- 
tional interests were diverse and rich. Attitudes were middle-of-the-road. 
More had married and at earlier ages than for college graduates in general, 
but divorce was less than half as frequent; happiness in marriage was 
rated high, and sex adjustment appeared in no way atypical. 

Chapters 20-26 deal with somewhat special problems. Accelerates 
in school were found greatly to exce! non-accelerates in the group, in 
achievement on a test battery in 1922; in 1940, over twice as many 
accelerates were in the top group in vocational success. Accelerates 
married earlier, and appeared not handicapped in adjustment or in 
physical or mental health. Special study of the subjects with I. Q.’s of 
170 or above show them “about as successful as lower testing subjects in 
social adjustments,” and they accomplish more. Subjects of Jewish 
descent differed little from the non-Jewish ‘except in their greater drive 
for vocational success, their somewhat greater tendency toward liberalism 
in political attitudes, their somewhat lower divorce rate.’”’ A vigorous 
chapter on factors in the achievement of gifted men showed the most 
successful distinguished primarily not by intelligence but especially by 
drive to achieve, and by all-round adjustment; outstanding accomplish- 
ment was not associated with marked emotional tensions but rather 
with stability and freedom from excessive frustration. War records 
were good. A careful chapter on the appraisal of achievement, empha- 
sized the variety of possible values, the possibility that admirable achieve- 
ment might not involve eminence, and the need for later data if appraisals 
of accomplishment are to be adequate. The final chapter stresses the 
importance of future follow-ups, and over-views the total investigation 
in larger perspectives. 

In total, then, the volume is an outstanding example of that most rare, 
but probably most valuable type of psychological investigation—the 
broadly conceived, long-time developmental study. The subjects were 
that portion of the total population most valuable to society. For all 
who are interested in problems of human personality in its finest poten- 
tialities, or the most challenging opportunities in education and guidance, 
the volume should be a “must.”’ 

Sidney L. Pressey 

Ohio State University 





New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should 
be sent to Donald G. Paterson, Editor, Department of Psychology, 
University of Minnesota, Minneapolis 14, Minnesota 


ABC’s of scapegoating. Revised edition. Gordon W. Allport. New 
York: Anti-Defamation League of B’Nai B’rith, 1948. Pp. 56. $.20. 

Practical psychology. Karl S. Bernhardt. New York: McGraw-Hill 
Book Co., Inc., 1948. Pp. 319. $2.50. 

Industrial psychology and its social foundations. Milton L. Blum. New 
York: Harper and Brothers, 1949. Pp. 518. $4.50. 

Student personnel services in general education. Paul J. Brouwer. Wash- 
ington, D. C.: American Council on Education, 1949. Pp. 317. $3.50. 

Managers, men and morale. Wilfred B. D. Brown and Winifred Raphael. 
London, Eng.: MacDonald and Evans, 1948. Pp. 163. 10/6. 

The third mental measurements yearbook. Oscar K. Buros, Editor. New 
Brunswick: Rutgers University Press, 1949. Pp. 1047. $12.50. 
Personal adjustment in old age. Ruth Shonle Cavan, Robert J. Havig- 
hurst, Ernest W. Burgess, and Herbert Goldhamer. Chicago: Science 

Research Associates, 1949. Pp. 175. $2.95. 

The psychology of social classes. Richard Centers. Princeton: Princeton 
University Press, 1948. Pp. 432. $5.00. 

Psychologist unretired. Miriam Allen deFord. Stanford: Stanford Uni- 
versity Press, 1948. Pp. 127. $3.00. 

Film and education. Godfrey Elliott, Editor. New \ ork: Philosophical 
Library, 1948. Pp. 597. $7.50. 

The energetics of human behavior. G. L. Freeman. Ithaca: Cornell 
University Press, 1948. Pp. 352. $3.50 

Man’s place in God’s world. Sol W. Ginsburg. New York: Hebrew 
Union College, Jewish Institute of Religion, 1949. Pp. 30. $.50. 

4-Square planning for your career. S. A. Hamrin. Chicago: Science 
Research Associates, 1948. Pp. 200. $2.95. 

Adolescent character and personality. Robert J. Havighurst and Hilda 
Taba. New York: John Wiley and Sons, Inc., 1949. Pp. 315. 
$4.00. 

How to create job enthusiasm. Carl Heyel. New York: McGraw-Hill 
Book Co., Inc., 1948. Pp. 248. $3.00. 

Psychology and ethics. Harry L. Hollingworth. New York: Ronald 
Press Co., 1949. Pp. 247. $3.50. 


191 








192 New Books, Monographs, and Pamphlets 


Applied psychology. Revised edition. Richard Wellington Husband. 
New York: Harper and Brothers, 1949. Pp. 845. $4.50. 

Theory and problems of social psychology. David Krech and Richard 
Crutchfield. New York: McGraw-Hill Book Co., Inc., 1949. Pp. 
639. $4.50. 

Discovering your real interests. G. Frederic Kuder and Blanche B. 
Paulson. Chicago: Science Research Associates, 1949. Pp. 48. 
Single copy, $.75. Fifteen or more copies, $.60. 

Personality projection in the drawing of the human figure. Karen Mac- 
hover. Springfield, Ill.: Charles C. Thomas, Publisher, 1949. Pp. 
181. $3.50. 

Psychological statistics. Quinn McNemar. John Wiley and Sons, Inc., 
1949. Pp. 364. $4.50. . 
Workers wanted. E. William Noland and E. Wight Bakke. New York: 

Harper and Brothers, 1949. Pp. 224. $3.00. 

The procurement and training of ground combat troops. Robert Palmer, 
Bell I. Wiley, and William R. Keast. Washington, D. C.: Super- 
intendent of Documents, U. S. Government Printing Office, 1948. 
Pp. 696. $4.50. 

Industrial hygiene and toxicology. Volume I. Frank A. Patty. New 
York: Interscience Publishers, Inc., 1948. Pp. 531. $10.00. 

Machine computation of elementary statistics. Katharine Pease. New 
York: Chartwell House, Inc., 1949. Pp. 238. $2.75. 

Job horizons. Lloyd G. Reynolds and Joseph Shister. New York: 
Harper and Brothers, 1949. Pp. 102. $2.25. 

Human relations in an expanding company. Frederick L. W. Richardson 
and Charles R. Walker. New Haven, Connecticut: Yale Labor and 
Management Center, 1948. Pp. 95. $1.50. 

Company annual reports to stockholders, employees, and the public. Thomas 
H. Sanders. Boston: Division of Research, Harvard Business School, 
1949. Pp. 338. $3.75. 

An outline of social psychology. Muzafer Sherif. New York: Harper 
and Brothers, 1948. Pp. 479. $4.00. 

Government regulation of industrial relations. George W. Taylor. New 
York: Prentice-Hall, Inc., 1948. Pp. 383. $4.00. 

Social class in America. W. Lloyd Warner and Kenneth W. Eells. 
Chicago: Science Research Associates, 1949. Pp. 292. $4.25. 

Constructing classroom examinations. Ellis Weitzman and Walter J. 
McNamara. Chicago: Science Research Associates, 1949. Pp. 140. 
$2.50. 

Human behavior and the principle of least effort. George Kingsley Zipf. 
Cambridge: Addison-Wesley Press, Inc., 1949. Pp. 650. $6.50. 





New Books, Monographs, and Pamphlets 193 


Symposium on industrial relations. American Journal of Sociology. 
January 1949 issue. Chicago: University of Chicago Press, $1.25. 

Factors affecting the satisfactions of home economics teachers. AVA Re- 
search Bulletin No. 3. Washington, D. C.: Committee on Research 
and Publications, American Vocational Association, Inc., 1948. Pp. 
96. $.75. 

Hours of work and output. Bulletin No.917. Bureau of Labor Statistics. 
Washington, D. C.: Superintendent of Documents, U. 8. Government 
Printing Office, 1948. Pp. 160. $.35. 

Operating under the LMEA, relation of wages to productivity. Personnel 
Series Number 122. New York: American Management Association, 
1948. Pp. 63. $1.25. 

Sociometry and group relations. Work of Progress Series. New York: 
American Council on Education, 1948. $1.25. 

The open house in industry. Chicago: National Metal Trades Association, 
122 South Michigan Avenue, 1948. Pp. 27. 

The UAW-CIO looks at time study. Detroit: UAW-CIO Education 
Department, 28 West Warren Street, 1947. Pp. 32. $.50. 

Employees suggestion programs in the iron and steel industry. New York: 
American Iron and Steel Institute, 350 Fifth Avenue, 1948. Pp. 92. 

Atr conditioning in textile mills: the case for temperature and humidity con- 
trol to provide comfort, health, safety, and optimum production. New 


York: Research Department, Textile Workers Union of America, 99 
University Place, 1948. Pp. 60. 























1948 DIRECTORY 


AMERICAN PSYCHOLOGICAL ASSOCIATION 


1515 MASSACHUSETTS AVENUE NORTHWEST 
WASHINGTON 5, D.C. 


eM 


This directory gives biographical data and fields of 
interest for the Fellows and Associates of the American 
Psychological Association. Membership lists for the 
Divisions of the Association, the by-laws, a list of past 
officers and meeting places, and a geographical and in- 
stitutional index of members are included. The di- 
rectory is edited by Helen M. Wolfle of the Association 


staff. 


According to present plans of the Association, an- 
other directory including biographical data will not 
be published until 1951. The interim issues will con- 
tain the names of the members, their addresses, and 


their present positions. 


438 pages 




















© Special Pre-Publication Offer to 
Readers of the Journal 








The Encyclopedia of 


CRIMINOLOGY 


Edited by Dr. Vernon C. Pranham and Dr. Samuel B. Kutash 


sa Brera 
Michael J. Posoor, M.D; Walter C. Reckless, Ph.D.; Orlando F. 
Wolf, LL.D. ; Arthur L. Wood, Ph.D. 


Publication date 
is June, 1949. If 
you send in your 
order before 
publication, you 
can get your 


copy for 
ONLY $10.00 


PHILOSOPHICAL LIBRARY, Publishers 
15 East 40th Street, Dept. 186, New York 16, N.Y. 


ENCYCLOPEDIA OF OF CRIMINOLOGY at the epecial Pre. 


Publication price of § $10.00 per copy. The The book(s) 7 Phe waeiied 


NAME. 





ADDRESS. 
(Expedite shipment by enclosing remittonce) 











work will serve the needs of all those con- 


in a variety of dis- 


PSYCHOLOGY red hb SOCIOLOGY MEDICINE 
LAW POLICE SCIENCE EDUCATION HISTORY 
PENOLOGY PHILOSOPHY RELIGION RESEARCH 


CONTRIBUTORS INCLUDE: Hon. Justice Francis Bergan; Na- 


thaniel Cantor, Ph.D.; Hervey Cleckley, M.D.; Marshall B. Clinard, 
Ph.D.; Victoria Cranford; Senator Thomas C. Desmond; Arthur NN. 


Foxe, M.D.; G. L. Ed.D.; Eleanor Glueck, Ph.D.; Sheldon 
Giueck, Ph.D.; ‘ ; Leland E. Hinsie, M_D.; Hon. 


erholser, M. D.; 


Publication: June, 1949. Approx. 500,000 words. $12.00 
[7 SPECIAL ORDER COUPON=—— === 


spel hei siradiids tek dia Sie talk etn ees hens gadaitein tec tlges lan cain hs anes tia eh va 








_, by Delle Yoder, Donald G. Paterson,” 
rt G. Heneman, Jr., C. Harold Stone, et al. 


Fa ut Describes the results and methods used in the study of the St. 
| im sea camplng with conning emphasis on such techniques as 


and clinical covsecliog of umemployes and roliet 
and relief 
ent indexes. 





Fy pags, ce icika Wide taadiod ibaa. 


1. Jobe te Industrial Relations, Bulletin 3. 





of of the relations 
ob macape 2 some +! idee women jobs in labor and 


gi et aek tin ak ser dick Bulletin S. 


- An annotated reference list of a minimum library of tative books 
and periodicals in industrial relations. $75. 


Industrial Relations Gleasary, Bulletin 6. 


Terme and definitions of selected technical words and phrases used in the 
field of industria! relations. $.75. 


Training and Research in Industrial Relations, I; 1I; TI, Bulletins 1, 4, and 7. 
of annual conferences attended by tives of indus- 


ng oem eat common 
Tinaech wah Odea 
PRA ag wml ae ar chee Research and Technical Report 3. 


the physical employee handbooks, 
forms to the rule of good typography in staining maximum reads. 


Other Publications 


The Program of the Industrial Relations Center, no charge. 

Jobe fer All: A Primer of Theory, Bulletin 3, $1.00, 

Ten Years of the Minnesota Labor Relations Act, Bulletin 9, $1.00, 
Accounting Methods for Local Unions, Research and Technical Report 1, $1.00. 
How To Make a Wage Survey, Research and Technical Report 2, $1.00, 


The industrial Relations Center —University of Minnesota 


Bulletins may be ordered from: | Research and Technical Reports 
University of Minnesota Press | may be ordered from: 

Box 33, Nicholson Hall Professional Colleges Bookstore, 
Minneapolis 14, Minnesota Box 20 

University of Minnesota 
Minneapolis 14, Minnesota 














